2010 Feb 28 5:24 AM
Hi Experts,
Looking for help.
I have a requirement where i need to truncate string based on byte count.
As some of chinese or other language char are multibyte and in string i need to truncate a part based on no of byte.
I know in normal scenario we do it as STRING +offset(length) but that gives in character mode count.
But based on no off byte i have to extract a substring.
Hope i am clear about the requirement.
Regards,
Digvijay Singh
2010 Mar 01 8:09 AM
Hi Digvijay,
Not sure if I'm missing the simple solution (and if I understand the problem, because usually you actually want to handle characters and <i>not bytes</i>), but here's a fairly convoluted one for truncating a string at a specific byte count:
CONSTANTS:
co_input TYPE string VALUE 'u7E41u4F53u5B57'.
DATA:
l_codepage TYPE cpcodepage,
l_encoding TYPE abap_encoding,
l_conv_out TYPE REF TO cl_abap_conv_out_ce,
l_conv_obj TYPE REF TO cl_abap_conv_obj,
l_string TYPE string,
l_xstring TYPE xstring.
CALL FUNCTION 'SCP_CODEPAGE_FOR_LANGUAGE'
EXPORTING
language = sy-langu
IMPORTING
codepage = l_codepage
EXCEPTIONS
OTHERS = 0.
l_encoding = l_codepage.
l_conv_out = cl_abap_conv_out_ce=>create( encoding = l_encoding ).
l_conv_out->convert( EXPORTING data = co_input
IMPORTING buffer = l_xstring ).
sy-fdpos = xstrlen( l_xstring ) - 3.
l_xstring = l_xstring(sy-fdpos).
CREATE OBJECT l_conv_obj.
l_conv_obj->convert( EXPORTING inbuff = l_xstring
outbufflg = 0
IMPORTING outbuff = l_string ).
WRITE: / 'Before:', co_input, '; After:', l_string.
When I run this example in our system (codepage 4103, which corresponds to utf-16 little endian) I get the following output:
Before: 繁体字 ; After: 繁
The convoluted coding is in my opinion necessary to avoid splitting multi-byte characters in half. So in the example you can see that I take off three bytes, but it actually results in omitting the last two characters (as we basically got one and a half). I'd hope that somebody has a shorter and more elegant solution, let's see...
If you'd know that your strings never contain [surrogate pairs|http://unicode.org/faq/utf_bom.html#utf16-2] you'd have it much simpler though, because then you'd know that one character corresponds to exactly two bytes on the application server (since application server always uses UTF16).
Cheers, harald
2010 Feb 28 8:54 AM
Hi,
I think if you are working in ECC version, which is Unicode, it's the same for you to use the old way.
Cheers,
Edited by: NI SHILIANG on Feb 28, 2010 9:55 AM
2010 Mar 01 8:09 AM
Hi Digvijay,
Not sure if I'm missing the simple solution (and if I understand the problem, because usually you actually want to handle characters and <i>not bytes</i>), but here's a fairly convoluted one for truncating a string at a specific byte count:
CONSTANTS:
co_input TYPE string VALUE 'u7E41u4F53u5B57'.
DATA:
l_codepage TYPE cpcodepage,
l_encoding TYPE abap_encoding,
l_conv_out TYPE REF TO cl_abap_conv_out_ce,
l_conv_obj TYPE REF TO cl_abap_conv_obj,
l_string TYPE string,
l_xstring TYPE xstring.
CALL FUNCTION 'SCP_CODEPAGE_FOR_LANGUAGE'
EXPORTING
language = sy-langu
IMPORTING
codepage = l_codepage
EXCEPTIONS
OTHERS = 0.
l_encoding = l_codepage.
l_conv_out = cl_abap_conv_out_ce=>create( encoding = l_encoding ).
l_conv_out->convert( EXPORTING data = co_input
IMPORTING buffer = l_xstring ).
sy-fdpos = xstrlen( l_xstring ) - 3.
l_xstring = l_xstring(sy-fdpos).
CREATE OBJECT l_conv_obj.
l_conv_obj->convert( EXPORTING inbuff = l_xstring
outbufflg = 0
IMPORTING outbuff = l_string ).
WRITE: / 'Before:', co_input, '; After:', l_string.
When I run this example in our system (codepage 4103, which corresponds to utf-16 little endian) I get the following output:
Before: 繁体字 ; After: 繁
The convoluted coding is in my opinion necessary to avoid splitting multi-byte characters in half. So in the example you can see that I take off three bytes, but it actually results in omitting the last two characters (as we basically got one and a half). I'd hope that somebody has a shorter and more elegant solution, let's see...
If you'd know that your strings never contain [surrogate pairs|http://unicode.org/faq/utf_bom.html#utf16-2] you'd have it much simpler though, because then you'd know that one character corresponds to exactly two bytes on the application server (since application server always uses UTF16).
Cheers, harald
2010 Mar 01 12:45 PM
I guess you ask the question because you are in a NON-unicode system. As explained in [ABAP documentation - Conversion Table for Source Field Type c|http://help.sap.com/abapdocu_70/en/ABENCONVERSION_TYPE_C.htm], you may call method CL_SCP_LINEBREAK_UTIL=>STRING_SPLIT_AT_POSITION, to be used especially for non-unicode double-byte characters.