Re: Issue with XML encoding

asatish_kumar · ‎2014 Mar 10

Dear SDN members,

I am facing somem issues with xml encoding for diacritical characters (special characters (non-english)).

For example below org unit name AfricaéçaçÃtest is converted to Africa a test.
special chars are replaced with spaces , i used below code
IF lv_xml_data IS NOT INITIAL.
* ====================================================================================
* Correct the Encoding of output
* ====================================================================================

lv_xml_data TYPE string,
lv_buffer TYPE xstring.

EXPORT lv_xml_data TO DATA BUFFER lv_buffer.

    lo_conv = cl_abap_conv_in_ce=>create( input = lv_buffer
                                           encoding = '1105'
                                           ignore_cerr = 'X'
                                          replacement = '' ).

lo_conv->convert( EXPORTING input = lv_buffer
IMPORTING data = lv_xml_data ).

SHIFT lv_xml_data UP TO '<root'.
ENDIF.

ev_xml_data = lv_xml_data.

Tried removing export which is causing the issue (AfricaéçaçÃtest converts to AfricaÃ©Ã§aÃ§Ãƒ test

i tried changing encoding types 4110 (utf-8) , 1160,1101,1100 etc. but it didn't work .

i tried this CALL FUNCTION 'SCMS_STRING_TO_XSTRING' EXPORTING text = lv_xml_data IMPORTING buffer = lv_buffer.

and also string to binary and then binary to xstring but none has slved my problem .

Appretiate your inputs to resolve ths issue .

Thanks & Regards
Satish

former_member182354 · ‎2014 Mar 10

Hi Satish,

CALL FUNCTION 'SCMS_STRING_TO_XSTRING this FM has this issue.

Use BCS class to avoid this where the special characters are sent without truncation or dump.

      try.
call method cl_bcs_convert=>string_to_solix
EXPORTING
iv_string   = lv_string
iv_codepage = lc_codepage
iv_add_bom = gc_x
IMPORTING
et_solix    = lt_solix.
catch cx_bcs .
endtry.

*-- Create persistent send request
l_send_request = cl_bcs=>create_persistent( ).
wt_contents[] = t_contents[].

*-- Get the length of the Document
describe table wt_contents lines l_cnt.
read table wt_contents into ws_contents index l_cnt.
l_doc_len = ( l_cnt - 1 ) * 255 + strlen( ws_contents ).
*-- Subject of the mail
l_sub = w_mail_subj.
*-- Create Document
try.
l_document = cl_document_bcs=>create_document(
i_type       = lc_htm
i_text       = wt_contents
i_length     = l_doc_len
i_subject    = l_sub
i_language   = sy-langu
i_importance = '1' ).
catch CX_DOCUMENT_BCS.
endtry.
*-- Subject of the mail
move w_mail_subj to l_subj.
w_document = l_document.

Raghav

RaymondGiuseppi · ‎2014 Mar 10

Look in Code Gallery for Escape HTML.

Regards,

Raymond

Former Member · ‎2014 Mar 11

You can use class cl_abap_conv_x2x_ce to change encoding of hex representation of text.

Looking at your example, you are trying to change encoding of a text directly.

Consider é, Hex representation for UTF-8 encoding would be C3A9. When this hex data is interpreted as latin encoded text, text would be Ã©.

TRY . 
    DATA: text    TYPE string VALUE 'AfricaÃ©Ã§aÃ§Ãtest', 
          buffer TYPE xstring, 
          conv    TYPE REF TO cl_abap_conv_out_ce. 
 
    conv = cl_abap_conv_out_ce=>create( 
             encoding = '1164' 
             ignore_cerr = 'X'  ). 
 
    conv->convert( EXPORTING data = text 
                   IMPORTING buffer = buffer ). 
 
    WRITE / buffer. 
    CLEAR text. 
    DATA conv2 TYPE REF TO cl_abap_conv_in_ce. 
 
    conv2 = cl_abap_conv_in_ce=>create( 
              encoding = 'UTF-8' 
              ignore_cerr = 'X'   ). 
 
    conv2->convert( 
      EXPORTING input = buffer 
      IMPORTING data = text ). 
 
    WRITE / text. 
  CATCH cx_root. 
ENDTRY. 
 

/.

asatish_kumar · ‎2014 Mar 11

Hi Manish ,

The above code is replacing accent chars with # , i want without # .

please suggest.

Thanks & Regards

Satish

Former Member · ‎2014 Mar 11

It is supposed to show #, as Ã is not followed by something that can directly be interpreted correctly in UTF-8. You can dive deeper into hex level and do some substitutions that correct the output. Hex equivalent of Ãtest is C374657374. C3 could not be converted, so you get a #. In order to get Ã in UTF-8, C3 should be replaced by C383.

So, do not ignore the conversion error, and apply suitable substitution at hex level.

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My SAP Profile

My SAP Profile

Issue with XML encoding