‎2020 Jan 06 5:51 PM
Information: This question is a follow-up question to my previously answered question here: https://answers.sap.com/questions/12921343/simple-transformation-problems-with-encoding-conve.html
I found it could be handled most efficiently as a separate question.
---------------
Using fragments I realized the dynamic generation of XML tags without loosing the opening and closing brackets. However, If one such dynamic tag contains special symbols, they do not seem to be properly translated into the output byte string.
Example:
The field "DKTXT" is a description field containing the string "Ä & < >".
I want to generate the following output XML (ignore the whitespaces after &):
<?xml version="1.0" encoding="utf-8"?>
<Paket>
<Metadaten>
<DKTXT>& Auml; & amp; & lt; & gt;</DKTXT>
</Metadaten>
</Paket>
using this ABAP code for transformation:
DATA: lv_xml_out TYPE xsdany.
CALL TRANSFORMATION zpd_st_nscale_xml_abap
SOURCE paket = ms_xml_data-paket
RESULT XML lv_xml_out. "Using variable of type rawstring I get an UTF-8-XML
When I obtain the bytes from lv_xml_out and translate it via an online hex <--> utf-8 converter (https://sites.google.com/site/nathanlexwww/tools/utf8-convert), I get the following output:
Formatted XML (ignore the whitespace after &):
<?xml version="1.0" encoding="utf-8"?>
<Paket>
<Metadaten>
<DKTXT>Ä & amp; & lt; & gt;</DKTXT>
</Metadaten>
</Paket>
Raw bytes:
3C3F786D6C2076657273696F6E3D22312E302220656E636F64696E673D227574662D38223F3E0A3C50616B65743E0A20203C4D657461646174656E3E0A202020203C444B5458543EC3842026616D703B20266C743B202667743B3C2F444B5458543E0A20203C2F4D657461646174656E3E0A3C2F50616B65743E
The result: our processing software parsing the output XML is complaining about the 'Ä' and claims that the UTF-8 file is erronerous.
Why did it translate <, >, & bit not Ä? I would expect all symbols be translated (i.e. Ä --> @Auml;).
And how can I force translation of all special symbols into HTML-Characters?
‎2020 Jan 06 6:28 PM
The XML is completely normal and the program which complains is wrong (or maybe the file is not correctly sent).
Ä is represented as C384, which in UTF-8 represents the Unicode character U+00C4 which is Ä. In your XML header, it is clearly stated that the encoding after the header is UTF-8 so the XML is technically fine.
In XML, the only characters which need to be escaped are < and & as explained in XML standards:
The ampersand character (&) and the left angle bracket (<) must not appear in
their literal form, except when used as markup delimiters, or within a comment,
a processing instruction, or a CDATA section. If they are needed elsewhere, they
must be escaped using either numeric character references or the strings " & "
and " < " respectively.Other characters don't need to be represented by their character entity references.
‎2020 Jan 06 6:19 PM
To help people answer, here is the display side by side of text and UTF-8 hexadecimal:
<?xml version="1.0" 3C3F786D6C2076657273696F6E3D22312E3022
encoding="utf-8"?> 20656E636F64696E673D227574662D38223F3E0A
<Paket> 3C50616B65743E0A
<Metadaten> 20203C4D657461646174656E3E0A
<DKTXT> 202020203C444B5458543E
Ä & amp; & lt; C3842026616D703B20266C743B
& gt;</DKTXT> 202667743B3C2F444B5458543E0A
</Metadaten> 20203C2F4D657461646174656E3E0A
</Paket> 3C2F50616B65743EFAs we can see, Ä is represented as C384, which in UTF-8 represents the Unicode character U+00C4 which is Ä.
‎2020 Jan 06 6:28 PM
The XML is completely normal and the program which complains is wrong (or maybe the file is not correctly sent).
Ä is represented as C384, which in UTF-8 represents the Unicode character U+00C4 which is Ä. In your XML header, it is clearly stated that the encoding after the header is UTF-8 so the XML is technically fine.
In XML, the only characters which need to be escaped are < and & as explained in XML standards:
The ampersand character (&) and the left angle bracket (<) must not appear in
their literal form, except when used as markup delimiters, or within a comment,
a processing instruction, or a CDATA section. If they are needed elsewhere, they
must be escaped using either numeric character references or the strings " & "
and " < " respectively.Other characters don't need to be represented by their character entity references.
‎2020 Jan 07 11:40 AM
Hi Sandra, thank you for your clarification. It made me understand the whole topic a bit better. We store the XML file on the SAP filesystem and found out, that it is saved in ANSI, i.e. 'Ä' is encoded as E4 and there is no UTF-8 file header.
Hence the transformation is correct but the storing currently converts the encoding.
‎2020 Jan 06 6:30 PM
By the way, why do you use an online converter to display hexadecimal, why don't you use the ABAP debugger?
‎2020 Jan 07 10:01 AM
I wanted use a SAP-independent tool to make sure that I don't run into misconceptions in the SAP ABAP context. Additionally I can now prove that it is not an SAP-related issue.