Application Development Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 

Read XML file in UTF-8

Former Member
0 Kudos

I am trying to read an XML file into Internal table.In the internal table I see characters like '#' which replaces the character u2122 with '#' but it maintains the character ® . I want the u2122 to be as it is.I tried to use SET_ENCODING to set codepage to UTF-8 but it doesnt seem to do anything.The XML file I am reading is in UTF-8 encoding.Following is the code for it.

OPEN DATASET lv_filename FOR INPUT IN BINARY MODE message msg.

IF sy-subrc = 0.

READ DATASET lv_filename INTO response.

IF sy-subrc <> 0.

EXIT.

ELSE.

CALL FUNCTION 'SCMS_STRING_TO_XSTRING'

EXPORTING

text = response

IMPORTING

buffer = xml.

ENDIF.

CLOSE DATASET lv_filename.

CONSTANTS:

gc_encoding TYPE string VALUE 'UTF-8'.

DATA:

go_encoding TYPE REF TO if_ixml_encoding.

  • Creating the main iXML factory

l_ixml = cl_ixml=>create( ).

go_encoding = l_ixml->create_encoding( character_set = gc_encoding

byte_order = 0 ).

  • Creating a stream factory

l_streamfactory = l_ixml->create_stream_factory( ).

l_istream = l_streamfactory->create_istream_xstring( string = xml ).

CALL METHOD l_istream->set_encoding

EXPORTING

encoding = go_encoding.

  • Creating a document

l_document = l_ixml->create_document( ).

l_parser = l_ixml->create_parser( stream_factory = l_streamfactory

istream = l_istream

document = l_document ).

CLEAR:response , xml.

FREE: response , xml.

IF l_parser->parse( ) NE 0.

IF l_parser->num_errors( ) NE 0.

DATA: parseerror TYPE REF TO if_ixml_parse_error,

str TYPE string,

i TYPE i,

count TYPE i,

index TYPE i.

count = l_parser->num_errors( ).

WRITE: count, ' parse errors have occured:'.

index = 0.

WHILE index < count.

parseerror = l_parser->get_error( index = index ).

i = parseerror->get_line( ).

WRITE: 'line: ', i.

i = parseerror->get_column( ).

WRITE: 'column: ', i.

str = parseerror->get_reason( ).

WRITE: str.

index = index + 1.

ENDWHILE.

ENDIF.

ENDIF.

Once parsed I use methods get_node etc to populate internal tables.

Any help will be appreciated.This is urgent as I dont see any other way to remove the junk characters.

Thanks

2 REPLIES 2

Former Member
0 Kudos

try to delete the # with this:

DATA: tab(1).

tab = cl_abap_char_utilities=>horizontal_tab

replace all occurences of tab in ... with ' '.

Former Member
0 Kudos

Try to use codepage 4120, it keeps most of the characters as in file, but please check once again.