Application Development and Automation Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Truncate string based on byte

0 Likes
1,692

Hi Experts,

Looking for help.

I have a requirement where i need to truncate string based on byte count.

As some of chinese or other language char are multibyte and in string i need to truncate a part based on no of byte.

I know in normal scenario we do it as STRING +offset(length) but that gives in character mode count.

But based on no off byte i have to extract a substring.

Hope i am clear about the requirement.

Regards,

Digvijay Singh

1 ACCEPTED SOLUTION
Read only

Former Member
0 Likes
1,171

Hi Digvijay,

Not sure if I'm missing the simple solution (and if I understand the problem, because usually you actually want to handle characters and <i>not bytes</i>), but here's a fairly convoluted one for truncating a string at a specific byte count:


  CONSTANTS:
    co_input  TYPE string VALUE 'u7E41u4F53u5B57'.
  DATA:
    l_codepage TYPE cpcodepage,
    l_encoding TYPE abap_encoding,
    l_conv_out TYPE REF TO cl_abap_conv_out_ce,
    l_conv_obj TYPE REF TO cl_abap_conv_obj,
    l_string  TYPE string,
    l_xstring TYPE xstring.

  CALL FUNCTION 'SCP_CODEPAGE_FOR_LANGUAGE'
    EXPORTING
      language = sy-langu
    IMPORTING
      codepage = l_codepage
    EXCEPTIONS
      OTHERS   = 0.

  l_encoding = l_codepage.
  l_conv_out = cl_abap_conv_out_ce=>create( encoding = l_encoding ).
  l_conv_out->convert( EXPORTING data   = co_input
                       IMPORTING buffer = l_xstring ).

  sy-fdpos = xstrlen( l_xstring ) - 3.
  l_xstring = l_xstring(sy-fdpos).

  CREATE OBJECT l_conv_obj.
  l_conv_obj->convert( EXPORTING inbuff    = l_xstring
                                 outbufflg = 0
                       IMPORTING outbuff   = l_string ).

  WRITE: / 'Before:', co_input, ';  After:', l_string.

When I run this example in our system (codepage 4103, which corresponds to utf-16 little endian) I get the following output:

Before: 繁体字 ; After: 繁

The convoluted coding is in my opinion necessary to avoid splitting multi-byte characters in half. So in the example you can see that I take off three bytes, but it actually results in omitting the last two characters (as we basically got one and a half). I'd hope that somebody has a shorter and more elegant solution, let's see...

If you'd know that your strings never contain [surrogate pairs|http://unicode.org/faq/utf_bom.html#utf16-2] you'd have it much simpler though, because then you'd know that one character corresponds to exactly two bytes on the application server (since application server always uses UTF16).

Cheers, harald

3 REPLIES 3
Read only

Former Member
0 Likes
1,171

Hi,

I think if you are working in ECC version, which is Unicode, it's the same for you to use the old way.

Cheers,

Edited by: NI SHILIANG on Feb 28, 2010 9:55 AM

Read only

Former Member
0 Likes
1,172

Hi Digvijay,

Not sure if I'm missing the simple solution (and if I understand the problem, because usually you actually want to handle characters and <i>not bytes</i>), but here's a fairly convoluted one for truncating a string at a specific byte count:


  CONSTANTS:
    co_input  TYPE string VALUE 'u7E41u4F53u5B57'.
  DATA:
    l_codepage TYPE cpcodepage,
    l_encoding TYPE abap_encoding,
    l_conv_out TYPE REF TO cl_abap_conv_out_ce,
    l_conv_obj TYPE REF TO cl_abap_conv_obj,
    l_string  TYPE string,
    l_xstring TYPE xstring.

  CALL FUNCTION 'SCP_CODEPAGE_FOR_LANGUAGE'
    EXPORTING
      language = sy-langu
    IMPORTING
      codepage = l_codepage
    EXCEPTIONS
      OTHERS   = 0.

  l_encoding = l_codepage.
  l_conv_out = cl_abap_conv_out_ce=>create( encoding = l_encoding ).
  l_conv_out->convert( EXPORTING data   = co_input
                       IMPORTING buffer = l_xstring ).

  sy-fdpos = xstrlen( l_xstring ) - 3.
  l_xstring = l_xstring(sy-fdpos).

  CREATE OBJECT l_conv_obj.
  l_conv_obj->convert( EXPORTING inbuff    = l_xstring
                                 outbufflg = 0
                       IMPORTING outbuff   = l_string ).

  WRITE: / 'Before:', co_input, ';  After:', l_string.

When I run this example in our system (codepage 4103, which corresponds to utf-16 little endian) I get the following output:

Before: 繁体字 ; After: 繁

The convoluted coding is in my opinion necessary to avoid splitting multi-byte characters in half. So in the example you can see that I take off three bytes, but it actually results in omitting the last two characters (as we basically got one and a half). I'd hope that somebody has a shorter and more elegant solution, let's see...

If you'd know that your strings never contain [surrogate pairs|http://unicode.org/faq/utf_bom.html#utf16-2] you'd have it much simpler though, because then you'd know that one character corresponds to exactly two bytes on the application server (since application server always uses UTF16).

Cheers, harald

Read only

Sandra_Rossi
Active Contributor
0 Likes
1,171

I guess you ask the question because you are in a NON-unicode system. As explained in [ABAP documentation - Conversion Table for Source Field Type c|http://help.sap.com/abapdocu_70/en/ABENCONVERSION_TYPE_C.htm], you may call method CL_SCP_LINEBREAK_UTIL=>STRING_SPLIT_AT_POSITION, to be used especially for non-unicode double-byte characters.