Re: Conversion error in Unicode system

bajusz79 · ‎2022 Jun 24

Dear Experts,

I read an Adobe pdf from archive, convert it to string and send with an interface. It was working with SapScripts (as pdf in archive), but stopped working with Adobe pdf forms.

The root cause is, that the binary data can't be converted to string.

The pdf file contains lot of special characters, which are substituted with: #

We have Unicode system.

Part from original file (when I manually download from archive):

hÞ”Ymo7þ+û±ýd’ÃW

And how it is looks like after Binary -> String conversion:

h#ФYmo←7##+##¤dТ├W

I use this fm for the conversion:

CALL FUNCTION 'CACS_CONVERT_HEX_TO_STRING'
EXPORTING
xstring = l_xstring
IMPORTING
cstring = l_string.

But I try this, and not working too:

lv_encoding = '4102'. (tried with NON-UNICODE, UTF-8, 4110, etc...)
lv_ignore = abap_true.

cl_abap_conv_in_ce=>create(
EXPORTING
encoding = lv_encoding
ignore_cerr = lv_ignore
"endian = lv_endian
RECEIVING
conv = lr_conv ).

LOOP AT lt_bin_tab ASSIGNING <ls_bin>.
l_xstring = <ls_bin>-line.
TRY.
lr_conv->convert(
EXPORTING
input = l_xstring
IMPORTING
data = l_string ).
CATCH cx_sy_conversion_codepage .
CATCH cx_root.
ENDTRY.

l_string_pdf = l_string_pdf && l_string.
ENDLOOP.

If you have any ide what can cause the problem, let me know.

Thanks

Peter

Sandra_Rossi · ‎2022 Jun 24

I guess the main problem is not in the code you have posted, but the fact that you want a STRING for storing the PDF "file". PDF is made of bytes, not characters. Please provide the rest of the code how you generate the PDF.

bajusz79 · ‎2022 Jun 24

I don't generate the pdf, just send it via Webservice. And as I mention it was working with Sapscript.

I read from archive with fm: SCMS_AO_TABLE_GET

And convert the binary data to string (which is not working fine), and send l_string_pdf with the WebService.

Sandra_Rossi · ‎2022 Jun 24

OK. How do you pass it in the Web service?

bajusz79 · ‎2022 Jun 24

As an export parameter (table type):

ET_PDF_DATA TYPE RSBO_T_CHARLINE

And this is working fine, just the special characters substituted during the conversion. 😞

Sandra_Rossi · ‎2022 Jun 24

I mean, how is the PDF encoded in the Web service. Base64? Possibly zipped in some way? I can't imagine it's transported via "characters", it's non-sense. Don't forget that in Non-Unicode ABAP systems, it was accepted to pass bytes as characters, because technically one character was stored as one byte (omitting all DBCS stuff), but it was incorrect to use characters instead of bytes, semantically/theoretically speaking.

Go directly from Xstring to base64, it's exactly one ABAP line of code (could even be implicit during XML serialization, if the PDF is transmitted inside XML). No need to encore in UTF-8 or whatever.

bajusz79 · ‎2022 Jun 27

Believe or not, it is not encoded. 🙂 Just passed the character stream of the file. I tried in SOAPUI, call the service, save the content in notepad, and it and be opened as a pdf. It is a mature application which use this, so not easily possible to change the code itself only in SAP, because other side need to be adjusted too.

Of course if I need to rewrite from scratch I would use base64 encoding with fm: SSFC_BASE64_CODE

As we use in other interfaces. This is the background, but now they expect character stream.

What I find out, that it is working perfectly for simple SapScripts, but not pdf forms. If SapScript contains pictures some characters are substituted too, but the pdf displays, just without pictures.

Sandra_Rossi · ‎2022 Jun 27

I'm pretty sure it's impossible that it's not encoded, MIME don't support that. Or it's a very simple HTTP request with directly the PDF as body?

I think it's interesting to share how you did it in SOAPUI (share the HTTP request body containing the PDF), and we'll be fixed, and be able to propose a solution.

bajusz79 · ‎2022 Jun 27

Hi Sandra,

Thanks your time to check it. Is it possible to send somehow to you the full response?

It is a simple SOAP response with the PDF in the ET_PDF_DATA.

I don't want to post the response here as it is a real invoice from live system.

A part from the working case (an old Sapscript invoice from Archive as pdf):

This is a part from "not working" case from SOAPUI:

With a lot of substitution character: #

Sandra_Rossi · ‎2022 Jun 27

Thanks, it's sufficient. So, it's encoded inside XML, using Character or Entity References. In your screenshot I see that you are using Character References, starting with 2 characters & and # (possibly 3rd character is X for hexa) followed by byte value and ending with semicolon. I can see hex 12, hex C, decimal 8, decimal 1, hex 1C.

Starting from binary, you could convert to characters but I think you can't consider it as being UTF-8 as it could store 2 bytes as 1 character due to surrogate characters. But it should be fine with iso-8859-1 for instance (code page 1100). After that I guess it would be fine.

PS: it's really just to solve your specific issue, but you should re-think the whole solution, and pass binary as one base64 value, instead of text...

bajusz79 · ‎2022 Jun 28

lv_encoding = '1100'.
lv_ignore = abap_true.

cl_abap_conv_in_ce=>create(
EXPORTING
encoding = lv_encoding
ignore_cerr = lv_ignore
"endian = lv_endian
RECEIVING
conv = lr_conv ).

LOOP AT lt_otf_tab ASSIGNING <ls_otf>.
l_xstring = <ls_otf>-line.
TRY.
lr_conv->convert(
EXPORTING
input = l_xstring
IMPORTING
data = l_string ).
CATCH cx_sy_conversion_codepage .
CATCH cx_root.
ENDTRY.

l_string_pdf = l_string_pdf && l_string.

ENDLOOP.

I try with this code in the past too, but not success.

convert method returns the same substitution characters with codepage 1100. 😞

Sandra_Rossi · ‎2022 Jun 28

No, it's fine, you can see the first "#" is hex 0D. When it's stored inside XML, these "#" characters will be converted into XML Character References, like sequence of these characters for hex 0D : & # x D ; for hex or & # 13 ; for decimal, which are equivalent.

bajusz79 · ‎2022 Jun 30

Thanks the help.

Now it is a bit better, but seems not working fully:

On the top you see the WS response, and down the working pdf.

Sandra_Rossi · ‎2022 Jun 30

I don't see any difference. The sequence & # x 1 B ; means hex 1B which is a "non-printable character". Notepad displays it as a square (like SAP often displays it as a #).

Sandra_Rossi · ‎2022 Jun 30

Okay, I see something else: the character before Ymo.

Possibly it's the software you use which renders it incorrectly, or you don't pass the right "encoding" value in the XML header.

Sandra_Rossi · ‎2022 Jun 30

If you can create a small program which reproduces the issue, and provide it along a small PDF and the full generated HTTP request, that would be simpler to tell you what the issue is. Till now I can advise only based on incomplete pieces.

bajusz79 · ‎2022 Jul 01

This is an example interface I created now:

FUNCTION zsd_pdf_test2.
*"--------------------------------------------------------------------
*"*"Local Interface:
*" IMPORTING
*" VALUE(I_VBELN_VF) TYPE VBELN_VF OPTIONAL
*" VALUE(I_XBLNR) TYPE XBLNR OPTIONAL
*" VALUE(I_BLDAT) TYPE BLDAT OPTIONAL
*" VALUE(I_KUNNR) TYPE KUNNR OPTIONAL
*" EXPORTING
*" VALUE(E_FILE_MIME_TYPE) TYPE ZFILE_MIME_TYPE
*" VALUE(E_FILE_MIME_LENGTH) TYPE ZFILE_MIME_LENGHT
*" VALUE(E_BAPIRET1) TYPE BAPIRET1
*" VALUE(ET_PDF_DATA) TYPE RSBO_T_CHARLINE
*"--------------------------------------------------------------------
*Invoice number is given as Input.
*FInd the latest attchment of Invoice PDF Stored in easy archive.
*Download it into memory and convert it in to Binary.
* Return the RFC call with the Binary data of the file.

DATA:
l_vbeln TYPE vbeln_vf,
l_xstring TYPE xstring,
l_string TYPE string,
ls_toav0 TYPE toav0,
lt_connections TYPE TABLE OF toav0,
l_otf_length TYPE i,
lt_otf_tab TYPE TABLE OF tbl1024,
l_string_pdf TYPE string,
ls_rsbo_t_charline TYPE char2048,
lt_split TYPE TABLE OF string,
l_x_length TYPE i.

DATA: BEGIN OF st_tab_x ,
line(1024) TYPE x,
END OF st_tab_x.
DATA lt_tab_x LIKE STANDARD TABLE OF st_tab_x.
DATA: lv_pdf TYPE fpcontent.

FIELD-SYMBOLS:
<ls_otf> TYPE tbl1024,
<fs_string> TYPE string.

CALL FUNCTION 'FPCOMP_CREATE_PDF_FROM_SPOOL'
EXPORTING
i_spoolid = '25102'
i_partnum = 1
IMPORTING
e_pdf = lv_pdf
"e_pdf_file = lv_file
EXCEPTIONS
OTHERS = 1.

DATA: lr_conv TYPE REF TO cl_abap_conv_in_ce.
DATA: lv_encoding TYPE abap_encoding.

lv_encoding = '1100'.

CALL METHOD cl_abap_conv_in_ce=>create
EXPORTING
encoding = lv_encoding
"endian = ' '
replacement = '#'
ignore_cerr = abap_true
"input = bin_messagetext
RECEIVING
conv = lr_conv.

lr_conv->convert(
EXPORTING
input = lv_pdf
IMPORTING
data = l_string ).

SPLIT l_string AT cl_abap_char_utilities=>cr_lf INTO
TABLE lt_split.

LOOP AT lt_split ASSIGNING <fs_string>.
ls_rsbo_t_charline = <fs_string>.
APPEND ls_rsbo_t_charline TO et_pdf_data.
IF <fs_string> EQ '%%EOF'.
EXIT.
ENDIF.
ENDLOOP.

e_file_mime_type = ls_toav0-reserve.
e_file_mime_length = l_otf_length.
EXIT.

ENDFUNCTION.

For simpicity I read the pdf from spool, and process it. The result is the same as in PROD.

sap-soap-response.txt

Attached the WS response (SAP_SOAP_RESPONSE.txt) and the working pdf (SAP_WORKING_PDF.txt), please rename it to pdf and it will work.

Thanks to check.

Peter

Sandra_Rossi · ‎2022 Jul 01

Thank you. I'm checking...

Sandra_Rossi · ‎2022 Jul 01

What I can first see, is that each line of ET_PDF_DATA is limited at 2048 characters, so it will truncate all bytes/1100-encoded characters beyond 2048th -> corrupt PDF.

Why do you limit at 2048, why don't you use lines of any length? (ET_PDF_DATA type string_table for instance)

bajusz79 · ‎2022 Jul 01

Good point 🙂 Why I not realize this.

It was a mature interface (no idea who made it), and in the past as I see this SapScript invoices has lines which fits into 2000 chars. Anyway I changed the interface and the WS. The conversion problem not solved:

Attached the new response.sap-soap-response2.txt

Sandra_Rossi · ‎2022 Jul 01

Be careful to use a generic PDF file for the demo = anything you can generate yourself.

Not something with private info. Please delete your attachment.

and use this one instead this-is-a-dummy-pdf.txt

bajusz79 · ‎2022 Jul 01

Done, thx 🙂

Sandra_Rossi · ‎2022 Jul 01

In fact, you still work with your own PDF, the SOAP response that you generate from the PDF might contain it (although seems still buggy), so I recommend to switch completely to the PDF I have provided, and attach the SOAP response which corresponds to it.

Sandra_Rossi · ‎2022 Jul 01

Also, how did you change ET_PDF_DATA in your Web service definition, to go beyond the original limit of 2048 bytes/characters?

bajusz79 · ‎2022 Jul 01

I upload your pdf to spool, and read it and convers.

A part of the WS response is here, compared the pdf string:

Attached the full response.sap-soap-response3.txt

I changed ET_PDF_DATA to type: VBC_T_STRING , which is a table type for strings.

Sandra_Rossi · ‎2022 Jul 01

Concerning your screenshots, it's not valid comparison because you are using different tools, they display "non-printable" characters differently. Also maybe they select specific encodings.

The only way to be sure what's going on is to check the byte values (hex for instance), and know which encoding is used.

Concerning the SOAP response, I can find the exact differences at byte level, could you display in debug:

LV_PDF+1500

Hex value should be:

72005B006600380026001C006200

in your SOAP response, it's wrong at FFF8:

72005B00660038002600FFF86200

Also, your logic stops at %%EOF, it's a little bit risky as there can be several ones, you'd better just remove completely that logic + remove LT_SPLIT, and just do:

SPLIT l_string AT cl_abap_char_utilities=>cr_lf INTO TABLE et_pdf_data.

NB: concerning VBC_T_STRING, I recommend that you switch to more known STRING_TABLE, to avoid the risk that VBC_T_STRING is removed by SAP in the future.

Sandra_Rossi · ‎2022 Jul 01

NB: it takes a lot of time to investigate, so I would recommend to change the solution to work with bytes/base64 only and adapt the Web service consumer(s).

bajusz79 · ‎2022 Jul 01

Thanks for the advices.

When I debug:

LV_PDF+1500 is in hex:

0072005B006600380026001C006200

"The only way to be sure what's going on is to check the byte values (hex for instance), and know which encoding is used."

Sorry for the dumb question, but I am not in this encoding topic, but how to do that? What do you recommend for next steps?

Sandra_Rossi · ‎2022 Jul 01

Concerning the fact that you have 0072005B006600380026001C0062, but I have 72005B006600380026001C006200, it's the same 7 characters, the difference is about the "endianness" of the operating system/processor, the order how it encodes the characters in memory, "big endian" for yours, "little endian" for mine. It's fine, nothing wrong here.

Now, if you want to continue the investigation, search when it goes from 1C00 to FFF8, I mean 001C to F8FF in your case (big endian). I guess it's not due to the character encoding conversion of code page 1100 (iso-8859-1) because I tested it in my system, but who knows.

If it continues to be difficult to investigate, maybe you should refactor the solution, it's not good to play with characters when it's about bytes, because of all encoding stuff.

NB: "encoding" is the way how a character is to be encoded into one or more bytes and which values are given, or vice-versa. In UCS-2 (what SAP is using) and Big Endian, the character "i" is encoded in two bytes with hexadecimal values 72 and 00.

bajusz79 · ‎2022 Jul 04

Hi Sandra.

Thanks for your effort about this issue 🙂

I had an idea to try it. After I convert it to string in codepage 1100, I just convert back to binary.

And what I find out, that the 2 variable is different.

The fm:

FUNCTION zsd_pdf_test2.
*"----------------------------------------------------------------------
*"*"Local Interface:
*" IMPORTING
*" VALUE(I_VBELN_VF) TYPE VBELN_VF OPTIONAL
*" VALUE(I_XBLNR) TYPE XBLNR OPTIONAL
*" VALUE(I_BLDAT) TYPE BLDAT OPTIONAL
*" VALUE(I_KUNNR) TYPE KUNNR OPTIONAL
*" EXPORTING
*" VALUE(E_FILE_MIME_TYPE) TYPE ZFILE_MIME_TYPE
*" VALUE(E_FILE_MIME_LENGTH) TYPE ZFILE_MIME_LENGHT
*" VALUE(E_BAPIRET1) TYPE BAPIRET1
*" VALUE(ET_PDF_DATA) TYPE VBC_T_STRING
*"----------------------------------------------------------------------

DATA:
l_vbeln TYPE vbeln_vf,
l_xstring TYPE xstring,
l_string TYPE string,
ls_toav0 TYPE toav0,
lt_connections TYPE TABLE OF toav0,
l_otf_length TYPE i,
lt_otf_tab TYPE TABLE OF tbl1024,
l_string_pdf TYPE string,
ls_rsbo_t_charline TYPE string,
lt_split TYPE TABLE OF string,
l_x_length TYPE i.

DATA: BEGIN OF st_tab_x ,
line(1024) TYPE x,
END OF st_tab_x.
DATA lt_tab_x LIKE STANDARD TABLE OF st_tab_x.
DATA: lv_pdf TYPE fpcontent.

FIELD-SYMBOLS:
<ls_otf> TYPE tbl1024,
<fs_string> TYPE string.

CALL FUNCTION 'ADS_RETURN_SPOOLJOB'
EXPORTING
rqident = 6630
IMPORTING
pdf = lv_pdf.

DATA: lr_conv TYPE REF TO cl_abap_conv_in_ce.
DATA: lv_encoding TYPE abap_encoding.

lv_encoding = '1100'.

CALL METHOD cl_abap_conv_in_ce=>create
EXPORTING
encoding = lv_encoding
"endian = ' '
replacement = '#'
ignore_cerr = abap_true
"input = bin_messagetext
RECEIVING
conv = lr_conv.

lr_conv->convert(
EXPORTING
input = lv_pdf
IMPORTING
data = l_string ).

DATA: l_xstring2 TYPE xstring.

CALL FUNCTION 'SCMS_STRING_TO_XSTRING'
EXPORTING
text = l_string
* MIMETYPE = ' '
* ENCODING =
IMPORTING
buffer = l_xstring2.

"!!!!! l_xstring2 and lv_pdf is not the same.

SPLIT l_string AT cl_abap_char_utilities=>cr_lf INTO
TABLE lt_split.

LOOP AT lt_split ASSIGNING <fs_string>.
ls_rsbo_t_charline = <fs_string>.
APPEND ls_rsbo_t_charline TO et_pdf_data.
IF <fs_string> EQ '%%EOF'.
EXIT.
ENDIF.
ENDLOOP.

e_file_mime_type = ls_toav0-reserve.
e_file_mime_length = l_otf_length.
EXIT.

ENDFUNCTION.

And the result:

Do you think it makes sense to try different codepages? Or write a prg. which loops all codepages and compare the 2 binary string?

Sandra_Rossi · ‎2022 Jul 04

In fact, you don't understand very well what mean code page, character encoding, binary, string 😉

What you did with SCMS_STRING_TO_XSTRING is to encode the string into UTF-8 (the default when you don't pass the parameters MIMETYPE and ENCODING).

It makes no sense to re-encode the string. This string was a trick to convert bytes into characters, to simulate the behavior that you had before.

You should realize that you lose time in playing with character encoding. Currently, you don't know where it goes wrong (I even don't know because I don't have all information I need, and I should play a little bit with it to be sure what's wrong; I also don't know what encoding is expected by the software which calls the Web service). So, if you want to continue to patch the current solution, you should definitely learn the concept of character encoding. But it's a pity because all this is just a trick, and the definitive solution should be to use only bytes and base64.

bajusz79 · ‎2022 Jul 04

No, no I am absolutely not in this encoding things. 🙂

"What you did with SCMS_STRING_TO_XSTRING is to encode the string into UTF-8 (the default when you don't pass the parameters MIMETYPE and ENCODING)."

I want to convert back to codepage 1100, and compare if I got the same value in binary. Just forget to pass the encoding. I did it, and the 2 strings are the same. Just curiosity.

Anyway don't waste your time with this, I will investigate and will provide a solution if find.

bajusz79 · ‎2022 Aug 04

As I can't find the solution in short term, we modified the code to send the data in Binary64 encoded.

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My SAP Profile

My SAP Profile

Conversion error in Unicode system