2019 Sep 04 10:22 AM
Hello,
I'm trying to export a data file in encoding ISO-8859-15. This code type allows characters like €, Š, š, Ž, ž, Œ, œ or Ÿ.
I've looked into the ABAP tools available for this purpose, especially thru this great blog from sandra.rossi.
Here you can find my testing code:
*&---------------------------------------------------------------------*
*& Report ZBC_FILE_ENCODING
*&---------------------------------------------------------------------*
*& Encoding test report
*&---------------------------------------------------------------------*
REPORT zbc_file_encoding.
DATA: gv_file TYPE text255,
gt_file LIKE STANDARD TABLE OF gv_file,
gv_filename TYPE string,
gv_path TYPE string,
gv_fullpath TYPE string,
gv_bin_filesize TYPE i,
gt_bin_data TYPE solix_tab,
gv_xfile TYPE xstring,
gv_sfile TYPE string,
gv_sap_codepage TYPE cpcodepage,
gv_default_file_name TYPE string,
gv_external_name TYPE tcp00a-cpattr,
go_abap_conv_obj TYPE REF TO cl_abap_conv_obj,
gv_incode TYPE cpcodepage,
gv_outcode TYPE cpcodepage.
gv_file = 'A;B;C;é;€;Š;š;Ž;ž;Œ;œ;Ÿ'.
APPEND gv_file TO gt_file.
gv_file = 'D;E;F;é;€;Š;š;Ž;ž;Œ;œ;Ÿ'.
APPEND gv_file TO gt_file.
LOOP AT gt_file INTO gv_file.
CONCATENATE gv_sfile gv_file cl_abap_char_utilities=>cr_lf INTO gv_sfile.
ENDLOOP.
gv_external_name = 'ISO-8859-15'.
CALL FUNCTION 'SCP_CODEPAGE_BY_EXTERNAL_NAME'
EXPORTING
external_name = gv_external_name
IMPORTING
sap_codepage = gv_sap_codepage
EXCEPTIONS
OTHERS = 1.
IF sy-subrc NE 0.
MESSAGE ID sy-msgid TYPE sy-msgty NUMBER sy-msgno WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
ENDIF.
CALL FUNCTION 'SCP_GET_CODEPAGE_NUMBER'
EXPORTING
database_also = ' '
IMPORTING
appl_codepage = gv_incode
EXCEPTIONS
OTHERS = 1.
IF sy-subrc NE 0.
MESSAGE ID sy-msgid TYPE sy-msgty NUMBER sy-msgno WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
ENDIF.
gv_outcode = gv_sap_codepage.
CREATE OBJECT go_abap_conv_obj
EXPORTING
incode = gv_incode
outcode = gv_outcode
miss = 'S'
ctrlcode = '.'
EXCEPTIONS
OTHERS = 1.
IF sy-subrc NE 0.
MESSAGE ID sy-msgid TYPE sy-msgty NUMBER sy-msgno WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
ENDIF.
CALL METHOD go_abap_conv_obj->convert
EXPORTING
inbuff = gv_sfile
outbufflg = gv_bin_filesize
IMPORTING
outbuff = gv_xfile
EXCEPTIONS
OTHERS = 1.
IF sy-subrc NE 0.
MESSAGE ID sy-msgid TYPE sy-msgty NUMBER sy-msgno WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
ENDIF.
CALL FUNCTION 'SCMS_XSTRING_TO_BINARY'
EXPORTING
buffer = gv_xfile
IMPORTING
output_length = gv_bin_filesize
TABLES
binary_tab = gt_bin_data.
CONCATENATE 'test_encoding_' gv_external_name '.csv' INTO gv_default_file_name.
CALL METHOD cl_gui_frontend_services=>file_save_dialog
EXPORTING
default_file_name = gv_default_file_name
CHANGING
filename = gv_filename
path = gv_path
fullpath = gv_fullpath
EXCEPTIONS
OTHERS = 1.
IF sy-subrc NE 0.
MESSAGE ID sy-msgid TYPE sy-msgty NUMBER sy-msgno WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
ENDIF.
CALL METHOD cl_gui_frontend_services=>gui_download
EXPORTING
bin_filesize = gv_bin_filesize
filename = gv_fullpath
filetype = 'BIN'
show_transfer_status = ' '
CHANGING
data_tab = gt_bin_data
EXCEPTIONS
OTHERS = 1.
IF sy-subrc NE 0.
MESSAGE ID sy-msgid TYPE sy-msgty NUMBER sy-msgno WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
ENDIF.
Unfortunately, at the end, I don't get what I'm expecting:
Notepad++ says that the format is Windows-1252, the special characters are messed up and even the carriage return and line feed are not recognized.
Any idea of what I'm doing wrong and how to achieve my goal?
Thanks in advance for your help.
Best regards,
Marco Silva
2019 Sep 04 1:38 PM
It's just a Notepad++ question. It cannot guess efficiently what the actual code page is (the information is not stored) so there's a very simple algorithm to sniff the first bytes and guess with a high risk of false positives.
Manually force Notepad++ to consider it's ISO-8859-15 via the menu, and it will display the characters corresponding to the ISO-8859-15 character set:
PS: thanks for the minimal reproducible example! (no time lost by people who try to answer)
2019 Sep 04 1:29 PM
Hi Marco,
does it have to be ISO-8859-15 or is this provicient as well?
2019 Sep 04 1:38 PM
Hello,
I have to produce a file that legally should be in the ISO-8859-15 encoding. But I guess the main point is to allow the Euro symbol (€).
Anyway, since I can find the ISO-8859-15 attribute in table TCP00A of SAP for page code 1164, shouldn't be the system capable of creating the file in the right encoding?
Thanks.
Marco
2019 Sep 04 1:54 PM
I guess you will still have the issue of the file itself using the correct encoding, but in N++ at least you can see a correct representation using the encoding.
Update:
this is done by not adding the CRLF to the table lines. No conversion of the table lines. so just supply the table (gt_file) and do the gui_download using codepage 1164.
2019 Sep 04 1:38 PM
It's just a Notepad++ question. It cannot guess efficiently what the actual code page is (the information is not stored) so there's a very simple algorithm to sniff the first bytes and guess with a high risk of false positives.
Manually force Notepad++ to consider it's ISO-8859-15 via the menu, and it will display the characters corresponding to the ISO-8859-15 character set:
PS: thanks for the minimal reproducible example! (no time lost by people who try to answer)
2019 Sep 04 1:46 PM
Thank you Sandra.
So I guess my file is correctly created (despite the line breaks are not correctly interpreted).
2019 Sep 04 1:59 PM
Hmm # seems not good. It is # and not CR nor LF. Because of MISS='S' and CTRLCODE='.'? (substitute control characters by SUBSTC which is # by default). Ue CTRLCODE='T' (try)
Anyway, why not using the recommended class CL_ABAP_CODEPAGE?
2019 Sep 04 2:23 PM
You're right. I started with class CL_ABAP_CODEPAGE, but since I thought I wasn't getting the expected result, I tried others and ended up with CL_ABAP_CONV_OBJ. But I'll get back to CL_ABAP_CODEPAGE, it seems to perform correctly the task.
Thanks a lot for your help!
2019 Sep 04 2:46 PM
AFAIK, notepad++ use uchardet project to identify code page. Check at https://gitlab.freedesktop.org/uchardet/ if you want to raise an issue 🙂
2019 Sep 04 3:38 PM
Raymond Giuseppi I guess that the only way to identify the code page/character set is to count the number occurrences of each character and compare to a model of statistical character counts per language/character set, so with the given characters it can't work, but with this text in Estonian that should work (I hope): Caron, tuntud ka kui hachek, kiil, tšekk, ümberpööratud ümbermõõt, ümberpööratud müts, on diakriitik.
UPDATE: hmmm, no, it doesn't work. Maybe statistics are not available for Estonian...
2019 Sep 04 3:48 PM
Marco SILVA by the way, why ISO-8859-15, and not UTF-8 (= the first 3 bytes of the file can contain a BOM which identifies that it's a UTF-8 file; it's much more practical than ISO).