cancel
Showing results for 
Search instead for 
Did you mean: 

Inconsistency in table data extract between batches vs full extract

Vedette
Explorer
2,029

We are using RFC_READ_TABLE to extract SAP data in batches. But the data extract via smaller batches are different to that of a full table extract.

As an example of the problem, when we read table UST04, we will only specify the columns to be retrieved and the batch size. When we extract the data in batch sizes of 100 000, the data being returned is different from when the table is extracted at once in full.

With the batches extract there are duplicate records extracted but not with the full extract. Some of the data in the full extract does not exist in the batch extract. The total number of records in both extracts are the same but the batches extracts have duplicates as well as missing records.

Could there be a problem caused by table data buffering when a batch size is specified, vs when a full table is extracted? Or could this possibly be caused by the next batch extract starting at the wrong place, therefore extracting duplicate records, but the extract then stopping when the full table count number is reached, but before all records have been extracted?

Has anyone experienced something similar to this before?

View Entire Topic
matt
Active Contributor

See the comment on this blog: https://blogs.sap.com/2017/05/21/perl-module-sapnwrfc-to-retrieve-data-from-a-huge-sap-table-with-rf...

Downloading a huge table takes often too much time as the ROWSKIPS parameter does not prevent the module from reading the data, it just does not add them to the returned data table DATA. So with each consecutive read you will read and read more data (i.e. if you chunk it by 500 records, first invocation will read 500 records, next will read 1,000 records, but only return 500, then we read 1,500 records and again only return 500, etc.).

Also, note that the extraction does not impose any ordering, so the actual data returned is not well defined - at least in theory. In practice the database most likely will pick one execution plan and stick to it, so we should get all table rows.

Now this is the relevant code from the FM.

  SELECT * FROM (QUERY_TABLE) INTO <WA> WHERE (OPTIONS).
    IF SY-DBCNT GT ROWSKIPS.
*   copy all relevant fields into DATA (output) table
    ...
*   end of loop at FIELDS_INT
      APPEND DATA.
      IF ROWCOUNT > 0 AND SY-DBCNT GE ROWCOUNT. 
        EXIT.
      ENDIF.
    ENDIF.
  ENDSELECT.<br>

So what harald.boeing2 wrote can be seen to be correct. It's slow and since there's no ORDER BY on the SELECT, there is no guaranteed order on the database, so you may well get different results.

You can't use ROWCOUNT/ROWSKIPS to partition the data. If you know the key values you could use that. E.g. for UST04, first read all the BNAME's beginning with A, then B ...