‎2017 Aug 28 5:31 PM
Hi,
In NW 7.51 SAP converted the "classic" structure of data cluster tables from multiple rows of datatype LRAW to single row of the "new" datatype RAWSTRING (ABAP News for Release 7.51 – New Structure for Export/Import Tables).
Now the question left is what would be the best solution (size efficient) to save hugh datasets in DB in legacy system (earlier NW versions than 7.51):
Using old (multiple rows) data cluster table or a simple DB table with single RAWSTRING field?
I have done some tests and the results were quite confusing.
I’ve saved an internal table of MARA with 5000 records.
For conversion to XSTRING I used the EXPORT TO DATA BUFFER (COMPRESSION ON) command.
The results were:
Is there any explanation for it?
‎2017 Aug 29 8:22 AM
According to the following program, the length of the data clusters is exactly the same for an internal table stored in a data buffer or in the CLUSTD fields of the two possible export/import table structures, for compression on and off.
How have you counted? Don't forget, that besides the cluster itself, there are also the administrative columns in export/import table structures. Of course, this overhead is higher for the old tabular storage.
DATA(rnd) = cl_abap_random_int=>create(
seed = CONV i( sy-uzeit )
min = 1
max = 1000 ).
TYPES itab TYPE TABLE OF i WITH EMPTY KEY.
DATA(itab) = VALUE itab( FOR i = 1 UNTIL i > 100000
( rnd->get_next( ) ) ).
DATA(out) = cl_demo_output=>new( ).
out->next_section( 'Compression off' ).
DATA data_buffer TYPE xstring.
EXPORT itab = itab TO DATA BUFFER data_buffer
COMPRESSION OFF.
EXPORT itab = itab TO DATABASE demo_indx_blob(rn)
ID 'RandomNumbers' COMPRESSION OFF.
SELECT SINGLE clustd
FROM demo_indx_blob
WHERE relid = 'RN' AND
id = 'RandomNumbers'
INTO @DATA(indx_blob_clustd).
EXPORT itab = itab TO DATABASE demo_indx_table(rn)
ID 'RandomNumbers' COMPRESSION OFF.
SELECT *
FROM demo_indx_table
WHERE relid = 'RN' AND
id = 'RandomNumbers'
ORDER BY srtf2
INTO TABLE @DATA(indx_table).
DATA(indx_table_clustd) = VALUE xstring( ).
LOOP AT indx_table INTO DATA(indx_table_wa).
CONCATENATE indx_table_clustd
indx_table_wa-clustd(indx_table_wa-clustr)
INTO indx_table_clustd
IN BYTE MODE.
ENDLOOP.
out->write( xstrlen( data_buffer )
)->write( xstrlen( indx_blob_clustd )
)->write( xstrlen( indx_table_clustd ) ).
CLEAR data_buffer.
DELETE FROM DATABASE demo_indx_table(rn) ID 'RandomNumbers'.
DELETE FROM DATABASE demo_indx_blob(rn) ID 'RandomNumbers'.
out->next_section( 'Compression on' ).
EXPORT itab = itab TO DATA BUFFER data_buffer
COMPRESSION ON.
EXPORT itab = itab TO DATABASE demo_indx_blob(rn)
ID 'RandomNumbers' COMPRESSION ON.
SELECT SINGLE clustd
FROM demo_indx_blob
WHERE relid = 'RN' AND
id = 'RandomNumbers'
INTO @indx_blob_clustd.
EXPORT itab = itab TO DATABASE demo_indx_table(rn)
ID 'RandomNumbers' COMPRESSION ON.
SELECT *
FROM demo_indx_table
WHERE relid = 'RN' AND
id = 'RandomNumbers'
ORDER BY srtf2
INTO TABLE @indx_table.
indx_table_clustd = VALUE xstring( ).
LOOP AT indx_table INTO indx_table_wa.
CONCATENATE indx_table_clustd
indx_table_wa-clustd(indx_table_wa-clustr)
INTO indx_table_clustd
IN BYTE MODE.
ENDLOOP.
out->write( xstrlen( data_buffer )
)->write( xstrlen( indx_blob_clustd )
)->write( xstrlen( indx_table_clustd ) ).
DELETE FROM DATABASE demo_indx_table(rn) ID 'RandomNumbers'.
DELETE FROM DATABASE demo_indx_blob(rn) ID 'RandomNumbers'.
out->display( ).
‎2017 Aug 29 9:38 AM
Hi Horst,
I've been waiting for your answer 🙂
Regarding the measuring,
1. Since there isn't any API/command to query the record size in DB (At least not one I'm aware of),
I've created 3 new DB tables and checked the DB table size.
2. I'm not sure I understood the purpose of your testing program.
As far as I understand, all 3 variables should contain exactly the same value (regardless the way they are stored in DB). Hence, all 3 must be exactly in the same size, shouldn't they?
My query was regarding DB size.
‎2017 Aug 29 9:49 AM
The test program proves that all three clusters have the same size.
So in fact, I don't understand your question. Where is the mystery?
‎2017 Aug 29 10:01 AM
Why the size of data cluster table ("classic" multiple rows of LRAW) is smaller than simple DB table with single RAWSTRING field (when both store exactly the same XSTRING value)?
‎2017 Aug 29 10:16 AM
Is it? Your above numbers don't say that, or? In fact, from the description it is not that clear what you've done.
‎2017 Aug 29 2:52 PM
Well, so it seems (unless I've missed something):
2. Data cluster table ("classic" multiple rows of LRAW) for converted XSTRING – 272KB.
3. Simple DB table with RAWSTRING (for the same converted XSTRING) – 536KB.
‎2017 Aug 31 7:47 AM
Just to be complete:
"all 3 variables should contain exactly the same value (regardless the way they are stored in DB)"
Not really. As a rule, the size is about the same. But for optimization reasons, the binary content of the data cluster can differ between those that are stored in one string and those that are splitted over several lines.
Therefore, you must always use the appropriate IMPORT.
‎2017 Aug 30 6:49 AM
To summarize:
You claim that storing one and the same binary data in one RAWSTRING uses more space on the database than splitting it in several LRAW fields. If it is true, that finding should be independent from data clusters. You should be able to reproduce it by writing any xstring to a RAWSTRING or to several LRAW fields. Then it is a question to the database. If you can't reproduce it, use Open SQL to examine your data clusters in order to find the difference.
‎2017 Aug 30 9:59 AM
(Almost) correct.
Sorry about the confusion between Data cluster (export/import) tables and cluster tables.
The question is if it's really database related question (for any RAWSTRING vs. LRAW multiples rows tables) or something in the data cluster export/compression mechanism.
I had to abandon my assumption regarding the XSTRING length and re-checked it in my report.
My actual program consists of two steps:
1. EXPORT itab = itab TO DATA BUFFER data_buffer COMPRESSION ON.
2. EXPORT data_buffer2 = data_buffer TO DATABASE (COMPRESSION ON, which is the default).
I was surprised to found out the following results:
i.e. There is additional compression after initial compression from itab to XSTRING.
(That made me wonder if this compression can be executed endlessly :), but here I wasn't surprised again and found out that after third compression data buffer size is increasing every time).
P.S.
After understanding that the difference lies in the additional compression, I've added additional
EXPORT TO DATA BUFFER COMPRESSION ON command and rechecked the results.
The size of simple DB table with RAWSTRING was smaller than data cluster table (240KB), as expected.
‎2017 Aug 30 11:38 AM
I also suspected something like this and did the same (double compression for data buffer) in my example, but didn't see any difference.
‎2017 Aug 30 12:12 PM
Here my program that doesn't show any "double compression":
DATA(rnd) = cl_abap_random_int=>create(
seed = CONV i( sy-uzeit )
min = 1
max = 1000 ).
TYPES itab TYPE TABLE OF i WITH EMPTY KEY.
DATA(itab) = VALUE itab( FOR i = 1 UNTIL i > 100000
( rnd->get_next( ) ) ).
DATA(out) = cl_demo_output=>new( ).
DATA data_buffer TYPE xstring.
EXPORT itab = itab TO DATA BUFFER data_buffer
COMPRESSION ON.
out->write( xstrlen( data_buffer ) ).
DATA data_buffer_again TYPE xstring.
EXPORT buffer = data_buffer TO DATA BUFFER data_buffer_again
COMPRESSION ON.
out->write( xstrlen( data_buffer_again ) ).
EXPORT buffer = data_buffer TO DATABASE demo_indx_blob(rn)
ID 'RandomNumbers' COMPRESSION ON.
SELECT SINGLE clustd
FROM demo_indx_blob
WHERE relid = 'RN' AND
id = 'RandomNumbers'
INTO @DATA(indx_blob_clustd).
out->write( xstrlen( indx_blob_clustd ) ).
EXPORT buffer = data_buffer TO DATABASE demo_indx_table(rn)
ID 'RandomNumbers' COMPRESSION ON.
SELECT *
FROM demo_indx_table
WHERE relid = 'RN' AND
id = 'RandomNumbers'
ORDER BY srtf2
INTO TABLE @DATA(indx_table).
DATA(indx_table_clustd) = VALUE xstring( ).
LOOP AT indx_table INTO DATA(indx_table_wa).
CONCATENATE indx_table_clustd
indx_table_wa-clustd(indx_table_wa-clustr)
INTO indx_table_clustd
IN BYTE MODE.
ENDLOOP.
out->write( xstrlen( indx_table_clustd ) ).
out->display( ).
‎2017 Aug 30 12:26 PM
At least in my test system (NW 7.4 SP9), I do see a difference:
Anyway, I guess it depends in your dataset.
In my original example, I've used the contents of DB table MARA for itab.
‎2017 Aug 30 1:33 PM
‎2017 Aug 30 1:44 PM
Please check with larger dataset (MARA table, for example) or even simpler just add a dummy field (CHAR250 with empty values, for example) to ITAB structure.
‎2017 Aug 30 2:24 PM
Now, I see it too.
DATA(rnd) = cl_abap_random_int=>create(
seed = CONV i( sy-uzeit )
min = 1
max = 1000 ).
TYPES:
char255 type c LENGTH 255,
BEGIN OF struct,
col1 TYPE i,
col2 type char255,
END OF struct,
itab TYPE TABLE OF struct WITH EMPTY KEY.
DATA(itab) = VALUE itab( FOR i = 1 UNTIL i > 100000
( col1 = rnd->get_next( ) ) ).
DATA(out) = cl_demo_output=>new( ).
DATA data_buffer TYPE xstring.
EXPORT itab = itab TO DATA BUFFER data_buffer
COMPRESSION ON.
out->write( xstrlen( data_buffer ) ).
DATA data_buffer_again TYPE xstring.
EXPORT buffer = data_buffer TO DATA BUFFER data_buffer_again
COMPRESSION ON.
out->write( xstrlen( data_buffer_again ) ).
Looks strange.
-> Development
‎2017 Nov 29 12:53 PM
‎2017 Nov 30 7:38 AM
No, and I'm afraid there won't.
Any changes would be incompatible.
‎2017 Nov 30 11:18 AM
OK.
I guess we'll have to live with double compression, then 🙂
We might address it as a mysterious ABAP known bug, like the known Excel date bug.