Application Development and Automation Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Issue with XML encoding

asatish_kumar
Associate
Associate
0 Likes
2,538

Dear SDN members,

I am facing somem issues with xml encoding for diacritical characters (special characters (non-english)).

For example below org unit name AfricaéçaçÃtest is converted to Africa a  test.
special chars are replaced with spaces , i used below code
IF lv_xml_data IS NOT INITIAL.
* ====================================================================================
*       Correct the Encoding of output
* ====================================================================================


lv_xml_data     TYPE string,
lv_buffer       TYPE xstring.


    EXPORT lv_xml_data TO DATA BUFFER lv_buffer.


    lo_conv = cl_abap_conv_in_ce=>create( input = lv_buffer
                                           encoding = '1105'
                                           ignore_cerr = 'X'
                                          replacement = '' ).

    lo_conv->convert( EXPORTING input = lv_buffer
                      IMPORTING data = lv_xml_data  ).


    SHIFT lv_xml_data UP TO '<root'.
  ENDIF.

  ev_xml_data = lv_xml_data.

Tried removing export which is causing the issue (AfricaéçaçÃtest  converts to Africaéçaçà test

i tried changing encoding types 4110 (utf-8) , 1160,1101,1100 etc. but it didn't work .


i tried this   CALL FUNCTION 'SCMS_STRING_TO_XSTRING'   EXPORTING  text   = lv_xml_data    IMPORTING  buffer = lv_buffer.

and also string to binary and then binary to xstring but none has slved my problem .

Appretiate your inputs to resolve ths issue .

Thanks & Regards
Satish

5 REPLIES 5
Read only

former_member182354
Contributor
0 Likes
1,597

Hi Satish,

               CALL FUNCTION 'SCMS_STRING_TO_XSTRING this FM has this issue.

               Use BCS class to avoid this where the special characters are sent without truncation or dump.

      try.
call method cl_bcs_convert=>string_to_solix
EXPORTING
iv_string   = lv_string
iv_codepage = lc_codepage
iv_add_bom  = gc_x
IMPORTING
et_solix    = lt_solix.
catch cx_bcs .
endtry.

*-- Create persistent send request
l_send_request = cl_bcs=>create_persistent( ).
wt_contents[] = t_contents[].

*-- Get the length of the Document
describe table wt_contents lines l_cnt.
read table wt_contents into ws_contents index l_cnt.
l_doc_len = ( l_cnt - 1 ) * 255 + strlen( ws_contents ).
*-- Subject of the mail
l_sub = w_mail_subj.
*-- Create Document
try.
l_document = cl_document_bcs=>create_document(
i_type       = lc_htm
i_text       = wt_contents
i_length     = l_doc_len
i_subject    = l_sub
  i_language   = sy-langu
i_importance = '1' ).
catch CX_DOCUMENT_BCS.
endtry.
*-- Subject of the mail
move w_mail_subj to l_subj.
w_document = l_document.


Raghav

Read only

RaymondGiuseppi
Active Contributor
0 Likes
1,597

Look in Code Gallery for Escape HTML.


Regards,

Raymond

Read only

Former Member
0 Likes
1,597

You can use class cl_abap_conv_x2x_ce to change encoding of hex representation of text.

Looking at your example, you are trying to change encoding of a text directly.

Consider é, Hex representation for UTF-8 encoding would be C3A9. When this hex data is interpreted as latin encoded text, text would be é.

See below snippet that is converting AfricaéçaçÃtest to Africaéçaç#test.

  1. TRY .
  2.     DATA: text    TYPE string VALUE 'AfricaéçaçÃtest',
  3.           buffer TYPE xstring,
  4.           conv    TYPE REF TO cl_abap_conv_out_ce.
  5.     conv = cl_abap_conv_out_ce=>create(
  6.              encoding = '1164'
  7.              ignore_cerr = 'X'  ).
  8.     conv->convert( EXPORTING data = text
  9.                    IMPORTING buffer = buffer ).
  10.     WRITE / buffer.
  11.     CLEAR text.
  12.     DATA conv2 TYPE REF TO cl_abap_conv_in_ce.
  13.     conv2 = cl_abap_conv_in_ce=>create(
  14.               encoding = 'UTF-8'
  15.               ignore_cerr = 'X'   ).
  16.     conv2->convert(
  17.       EXPORTING input = buffer
  18.       IMPORTING data = text ).
  19.     WRITE / text.
  20.   CATCH cx_root.
  21. ENDTRY.

/.

Read only

0 Likes
1,597

Hi Manish ,

The above code is replacing accent chars with # , i want without # .

please suggest.

Thanks & Regards

Satish

Read only

0 Likes
1,597

It is supposed to show #, as à is not followed by something that can directly be interpreted correctly in UTF-8. You can dive deeper into hex level and do some substitutions that correct the output. Hex equivalent of Ãtest is C374657374. C3 could not be converted, so you get a #. In order to get à in UTF-8, C3 should be replaced by C383.

So, do not ignore the conversion error, and apply suitable substitution at hex level.