Application Development and Automation Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

UTF-16 conversion problem

matt
Active Contributor
0 Likes
2,381

I've a zipped file that contains UTF-16 LE text files. I'm using the standard SAP CL_ABAP_ZIP to unzip, which gives me an xstring. I want to convert that xstring into a string, but the problem is that the Byte Order Marker FFFE is being retained in the string. This causes problems further down the line if I want to do any string operations.

In this code example, the output is "/AAA_Z " instead of "/AAA_ZI".

REPORT.

DATA content_as_x TYPE xstring.
content_as_x = 'FFFE2F004100410041005F005A0049000A000D002F004100410041005F005A004900'.

DATA lt_data TYPE TABLE OF x255.
DATA lv_len  TYPE i.

CALL FUNCTION 'SCMS_XSTRING_TO_BINARY'
  EXPORTING
    buffer        = content_as_x
  IMPORTING
    output_length = lv_len
  TABLES
    binary_tab    = lt_data.
DATA content_as_string TYPE string.
CALL FUNCTION 'SCMS_BINARY_TO_STRING'
  EXPORTING
    input_length = lv_len
    encoding     = CONV abap_encoding( '4103' )
  IMPORTING
    text_buffer  = content_as_string
  TABLES
    binary_tab   = lt_data.
DATA content TYPE string_table.
SPLIT content_as_string AT cl_abap_char_utilities=>cr_lf INTO TABLE content.
DATA first_line TYPE c length 7.
first_line = content[ 1 ].
cl_demo_output=>DISPLAY_DATA( first_line ).<br>
The hex content of first_line is FFFE2F004100410041005F005A00

Note that although I've hardcoded the SCMS_BINARY_TO_STRING encoding as UTF-16 LE (4103), in my real program it's a variable.
I've also tried

REPORT.

DATA content_as_x TYPE xstring.
content_as_x = 'FFFE2F004100410041005F005A0049000A000D002F004100410041005F005A004900'.
DATA content_as_string TYPE string.
DATA(conv) = cl_abap_conv_in_ce=>create( encoding = '4103'
                                         endian = 'L' ).
conv->convert( exporting input = content_as_x
               IMPORTING data  = content_as_string ).

DATA content TYPE string_table.
SPLIT content_as_string AT cl_abap_char_utilities=>cr_lf INTO TABLE content.
DATA first_line TYPE c length 7.
first_line = content[ 1 ].
cl_demo_output=>DISPLAY_DATA( first_line ).
Same issue.

Is there a better way of converting an xstring with a known encoding to a string that doesn't have these problems. Note, this needs to work in 7.4 (ideally 7.31 - don't ask!).

Or do I just have to snip off Byte Order Markers?
5 REPLIES 5
Read only

Sandra_Rossi
Active Contributor
2,262

I don't get your actual question. If it's about the BOM, just remove it for further processing.

Read only

Ryan-Crosby
Active Contributor
2,262

This is the information I have seen on the topic at several sites suggesting the content can be ignored -

For the IANA registered charsets UTF-16BE and UTF-16LE, a byte order mark should not be used because the names of these character sets already determine the byte order. If encountered anywhere in such a text stream, U+FEFF is to be interpreted as a "zero width no-break space".

Read only

matt
Active Contributor
2,262

sandra.rossi ryan.crosby2

"Or do I just have to snip off the BOMs".

It seems yes!

Many thanks.

Read only

BiberM
Contributor
2,262

To strip off the BOM you must not write that yourself: cl_bcs_utilities=>remove_bom_from_content(). Whether that exists outside an ERP System I don't know. It belongs to SAP_BASIS component and should therefor be available everywhere.

Read only

matt
Active Contributor
0 Likes
2,262

mbiber I can confirm it exists on none ERP systems.

Very useful class. Thanks.