Solved: Truncate string based on byte

former_member227603 · ‎2010 Feb 28

Hi Experts,

Looking for help.

I have a requirement where i need to truncate string based on byte count.

As some of chinese or other language char are multibyte and in string i need to truncate a part based on no of byte.

I know in normal scenario we do it as STRING +offset(length) but that gives in character mode count.

But based on no off byte i have to extract a substring.

Hope i am clear about the requirement.

Regards,

Digvijay Singh

Former Member · ‎2010 Mar 01

Hi Digvijay,

Not sure if I'm missing the simple solution (and if I understand the problem, because usually you actually want to handle characters and <i>not bytes</i>), but here's a fairly convoluted one for truncating a string at a specific byte count:


  CONSTANTS:
    co_input  TYPE string VALUE 'u7E41u4F53u5B57'.
  DATA:
    l_codepage TYPE cpcodepage,
    l_encoding TYPE abap_encoding,
    l_conv_out TYPE REF TO cl_abap_conv_out_ce,
    l_conv_obj TYPE REF TO cl_abap_conv_obj,
    l_string  TYPE string,
    l_xstring TYPE xstring.

  CALL FUNCTION 'SCP_CODEPAGE_FOR_LANGUAGE'
    EXPORTING
      language = sy-langu
    IMPORTING
      codepage = l_codepage
    EXCEPTIONS
      OTHERS   = 0.

  l_encoding = l_codepage.
  l_conv_out = cl_abap_conv_out_ce=>create( encoding = l_encoding ).
  l_conv_out->convert( EXPORTING data   = co_input
                       IMPORTING buffer = l_xstring ).

  sy-fdpos = xstrlen( l_xstring ) - 3.
  l_xstring = l_xstring(sy-fdpos).

  CREATE OBJECT l_conv_obj.
  l_conv_obj->convert( EXPORTING inbuff    = l_xstring
                                 outbufflg = 0
                       IMPORTING outbuff   = l_string ).

  WRITE: / 'Before:', co_input, ';  After:', l_string.

When I run this example in our system (codepage 4103, which corresponds to utf-16 little endian) I get the following output:


Before: 繁体字 ;  After: 繁

The convoluted coding is in my opinion necessary to avoid splitting multi-byte characters in half. So in the example you can see that I take off three bytes, but it actually results in omitting the last two characters (as we basically got one and a half). I'd hope that somebody has a shorter and more elegant solution, let's see...

If you'd know that your strings never contain [surrogate pairs|http://unicode.org/faq/utf_bom.html#utf16-2] you'd have it much simpler though, because then you'd know that one character corresponds to exactly two bytes on the application server (since application server always uses UTF16).

Cheers, harald

Former Member · ‎2010 Feb 28

Hi,

I think if you are working in ECC version, which is Unicode, it's the same for you to use the old way.

Cheers,

Edited by: NI SHILIANG on Feb 28, 2010 9:55 AM

Former Member · ‎2010 Mar 01

Hi Digvijay,

Not sure if I'm missing the simple solution (and if I understand the problem, because usually you actually want to handle characters and <i>not bytes</i>), but here's a fairly convoluted one for truncating a string at a specific byte count:


  CONSTANTS:
    co_input  TYPE string VALUE 'u7E41u4F53u5B57'.
  DATA:
    l_codepage TYPE cpcodepage,
    l_encoding TYPE abap_encoding,
    l_conv_out TYPE REF TO cl_abap_conv_out_ce,
    l_conv_obj TYPE REF TO cl_abap_conv_obj,
    l_string  TYPE string,
    l_xstring TYPE xstring.

  CALL FUNCTION 'SCP_CODEPAGE_FOR_LANGUAGE'
    EXPORTING
      language = sy-langu
    IMPORTING
      codepage = l_codepage
    EXCEPTIONS
      OTHERS   = 0.

  l_encoding = l_codepage.
  l_conv_out = cl_abap_conv_out_ce=>create( encoding = l_encoding ).
  l_conv_out->convert( EXPORTING data   = co_input
                       IMPORTING buffer = l_xstring ).

  sy-fdpos = xstrlen( l_xstring ) - 3.
  l_xstring = l_xstring(sy-fdpos).

  CREATE OBJECT l_conv_obj.
  l_conv_obj->convert( EXPORTING inbuff    = l_xstring
                                 outbufflg = 0
                       IMPORTING outbuff   = l_string ).

  WRITE: / 'Before:', co_input, ';  After:', l_string.

When I run this example in our system (codepage 4103, which corresponds to utf-16 little endian) I get the following output:


Before: 繁体字 ;  After: 繁

The convoluted coding is in my opinion necessary to avoid splitting multi-byte characters in half. So in the example you can see that I take off three bytes, but it actually results in omitting the last two characters (as we basically got one and a half). I'd hope that somebody has a shorter and more elegant solution, let's see...

If you'd know that your strings never contain [surrogate pairs|http://unicode.org/faq/utf_bom.html#utf16-2] you'd have it much simpler though, because then you'd know that one character corresponds to exactly two bytes on the application server (since application server always uses UTF16).

Cheers, harald

Sandra_Rossi · ‎2010 Mar 01

I guess you ask the question because you are in a NON-unicode system. As explained in [ABAP documentation - Conversion Table for Source Field Type c|http://help.sap.com/abapdocu_70/en/ABENCONVERSION_TYPE_C.htm], you may call method CL_SCP_LINEBREAK_UTIL=>STRING_SPLIT_AT_POSITION, to be used especially for non-unicode double-byte characters.

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My SAP Profile

My SAP Profile

Truncate string based on byte