‎2022 Aug 22 12:58 PM
I've a zipped file that contains UTF-16 LE text files. I'm using the standard SAP CL_ABAP_ZIP to unzip, which gives me an xstring. I want to convert that xstring into a string, but the problem is that the Byte Order Marker FFFE is being retained in the string. This causes problems further down the line if I want to do any string operations.
In this code example, the output is "/AAA_Z " instead of "/AAA_ZI".
REPORT.
DATA content_as_x TYPE xstring.
content_as_x = 'FFFE2F004100410041005F005A0049000A000D002F004100410041005F005A004900'.
DATA lt_data TYPE TABLE OF x255.
DATA lv_len TYPE i.
CALL FUNCTION 'SCMS_XSTRING_TO_BINARY'
EXPORTING
buffer = content_as_x
IMPORTING
output_length = lv_len
TABLES
binary_tab = lt_data.
DATA content_as_string TYPE string.
CALL FUNCTION 'SCMS_BINARY_TO_STRING'
EXPORTING
input_length = lv_len
encoding = CONV abap_encoding( '4103' )
IMPORTING
text_buffer = content_as_string
TABLES
binary_tab = lt_data.
DATA content TYPE string_table.
SPLIT content_as_string AT cl_abap_char_utilities=>cr_lf INTO TABLE content.
DATA first_line TYPE c length 7.
first_line = content[ 1 ].
cl_demo_output=>DISPLAY_DATA( first_line ).<br> The hex content of
first_line is FFFE2F004100410041005F005A00
Note that although I've hardcoded the SCMS_BINARY_TO_STRING encoding as UTF-16 LE (4103), in my real program it's a variable.
I've also tried
REPORT.
DATA content_as_x TYPE xstring.
content_as_x = 'FFFE2F004100410041005F005A0049000A000D002F004100410041005F005A004900'.
DATA content_as_string TYPE string.
DATA(conv) = cl_abap_conv_in_ce=>create( encoding = '4103'
endian = 'L' ).
conv->convert( exporting input = content_as_x
IMPORTING data = content_as_string ).
DATA content TYPE string_table.
SPLIT content_as_string AT cl_abap_char_utilities=>cr_lf INTO TABLE content.
DATA first_line TYPE c length 7.
first_line = content[ 1 ].
cl_demo_output=>DISPLAY_DATA( first_line ). Same issue.
‎2022 Aug 22 1:24 PM
I don't get your actual question. If it's about the BOM, just remove it for further processing.
‎2022 Aug 22 1:32 PM
This is the information I have seen on the topic at several sites suggesting the content can be ignored -
For the IANA registered charsets UTF-16BE and UTF-16LE, a byte order mark should not be used because the names of these character sets already determine the byte order. If encountered anywhere in such a text stream, U+FEFF is to be interpreted as a "zero width no-break space".
‎2022 Aug 22 1:42 PM
sandra.rossi ryan.crosby2
"Or do I just have to snip off the BOMs".
It seems yes!
Many thanks.
‎2022 Aug 22 2:29 PM
To strip off the BOM you must not write that yourself: cl_bcs_utilities=>remove_bom_from_content(). Whether that exists outside an ERP System I don't know. It belongs to SAP_BASIS component and should therefor be available everywhere.
‎2022 Aug 22 2:59 PM
mbiber I can confirm it exists on none ERP systems.
Very useful class. Thanks.