cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

How to process non-printable characters in SAP Data Services Application?

SAPSupport
Employee
Employee
0 Kudos
822

Hello,

Is there a way to exclude/replace non-printable characters with empty string?

"The first 32 characters in the ASCII-table are unprintable control codes and are used to control peripherals such as printers." and there are some under my source data.

Thank you,


------------------------------------------------------------------------------------------------------------------------------------------------
Learn more about the SAP Support user and program here.

Accepted Solutions (1)

Accepted Solutions (1)

SAPSupport
Employee
Employee
0 Kudos

Hello,

It is possible using the regex_replace function.

Example of mapping related to the column which might receive a non-printable characters:

  • regex_replace(<Input here your non-printable character/column>, '\\[\x00-\x09\x0B-\x0C\x0E-\x1F\\]', '')

For more information: regex_replace | SAP Help Portal

Best Regards,
SAP Support

Answers (2)

Answers (2)

EmelyModena
Product and Topic Expert
Product and Topic Expert

Example of Job:

Let's use a File as source which contains unprintable characters generated from the Windows PowerShell:

EmelyModena_0-1738070185953.png

Commands used (Windows PowerShell):

[char]0x07 | Set-Clipboard
[char]0x010 | Set-Clipboard
[char]0x013 | Set-Clipboard
[char]0x016 | Set-Clipboard
[char]0x018 | Set-Clipboard

Note: You can ignore "CF LF", since it is related to the line break.

This is how you enable the non-printable characters view under Notepad++:

In this job, the file previous mentioned will be our source and there will be a temp table as a target.
It is a simple job.

EmelyModena_1-1738070459790.png

Under the Query, we will use the regex_replace function under the mapping of the desired column:

regex_replace(<input_string>, <regular expression pattern string>, <replacement string>, <regular expression processing flags>)

Link regarding the replacement string for non-printable characters ([\x00-\x09\x0B-\x0C\x0E-\x1F]): 

EmelyModena_2-1738070769879.png

  • This is how you identify the non-printable characters under Notepad++:

EmelyModena_3-1738071412153.png
Thank you,
SAP Support

radinator
Participant
0 Kudos

You can also use this code

 

form remove_invalid_control_chars
  changing
    text_to_strip.

  types:
    begin of control_char_list_struct,
      hex  type c,
      code type c length 3,
    end of control_char_list_struct.

  data:
    control_char_list       type table of control_char_list_struct,
    control_char_list_entry like line of control_char_list,
    test_string type string,
    white_space type string.

  white_space = cl_abap_conv_in_ce=>uccp( '0020' ).
  control_char_list = value #(
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0000' ) code = 'NUL' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0001' ) code = 'SOH' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0002' ) code = 'STX' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0003' ) code = 'ETX' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0004' ) code = 'EOT' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0005' ) code = 'ENQ' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0006' ) code = 'ACK' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0007' ) code = 'BEL' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0008' ) code = 'BS' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0009' ) code = 'HT' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '000A' ) code = 'LF' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '000B' ) code = 'VT' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '000C' ) code = 'FF' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '000D' ) code = 'CR' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '000E' ) code = 'SO' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '000F' ) code = 'SI' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0010' ) code = 'DLE' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0011' ) code = 'DC1' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0012' ) code = 'DC2' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0013' ) code = 'DC3' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0014' ) code = 'DC4' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0015' ) code = 'NAK' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0016' ) code = 'SYN' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0017' ) code = 'ETB' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0018' ) code = 'CAN' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0019' ) code = 'EM' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '001A' ) code = 'SUB' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '001B' ) code = 'ESC' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '001C' ) code = 'FS' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '001D' ) code = 'GS' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '001E' ) code = 'RS' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '001F' ) code = 'US' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '007F' ) code = 'DEL' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0080' ) code = 'PAD' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0081' ) code = 'HOP' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0082' ) code = 'BPH' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0083' ) code = 'NBH' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0084' ) code = 'IND' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0085' ) code = 'NEL' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0086' ) code = 'SSA' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0087' ) code = 'ESA' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0088' ) code = 'HTS' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0089' ) code = 'HTJ' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '008A' ) code = 'VTS' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '008B' ) code = 'PLD' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '008C' ) code = 'PLU' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '008D' ) code = 'RI' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '008E' ) code = 'SS2' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '008F' ) code = 'SS3' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0090' ) code = 'DCS' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0091' ) code = 'PU1' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0092' ) code = 'PU2' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0093' ) code = 'STS' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0094' ) code = 'CCH' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0095' ) code = 'MW' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0096' ) code = 'SPA' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0097' ) code = 'EPA' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0098' ) code = 'SOS' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '0099' ) code = 'SGCI' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '009A' ) code = 'SCI' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '009B' ) code = 'CSI' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '009C' ) code = 'ST' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '009D' ) code = 'OSC' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '009E' ) code = 'PM' )
    ( hex = CL_ABAP_CONV_IN_CE=>uccp( '009F' ) code = 'APC' )
  ).

  loop at control_char_list into control_char_list_entry.
    replace all occurrences of control_char_list_entry-hex in text_to_strip with white_space.
  endloop.
endform.

 

Call the function with a text and you get the special chars replaced with a white space.

If you need to have \r and \n included in your text you can comment out the LF and CR entries.

vnovozhilov
Product and Topic Expert
Product and Topic Expert
This question in particular was regarding the approach to be taken in SAP Data Services.