cancel
Showing results for 
Search instead for 
Did you mean: 

Inconsistency in table data extract between batches vs full extract

Vedette
Explorer
1,948

We are using RFC_READ_TABLE to extract SAP data in batches. But the data extract via smaller batches are different to that of a full table extract.

As an example of the problem, when we read table UST04, we will only specify the columns to be retrieved and the batch size. When we extract the data in batch sizes of 100 000, the data being returned is different from when the table is extracted at once in full.

With the batches extract there are duplicate records extracted but not with the full extract. Some of the data in the full extract does not exist in the batch extract. The total number of records in both extracts are the same but the batches extracts have duplicates as well as missing records.

Could there be a problem caused by table data buffering when a batch size is specified, vs when a full table is extracted? Or could this possibly be caused by the next batch extract starting at the wrong place, therefore extracting duplicate records, but the extract then stopping when the full table count number is reached, but before all records have been extracted?

Has anyone experienced something similar to this before?

Accepted Solutions (1)

Accepted Solutions (1)

matt
Active Contributor

See the comment on this blog: https://blogs.sap.com/2017/05/21/perl-module-sapnwrfc-to-retrieve-data-from-a-huge-sap-table-with-rf...

Downloading a huge table takes often too much time as the ROWSKIPS parameter does not prevent the module from reading the data, it just does not add them to the returned data table DATA. So with each consecutive read you will read and read more data (i.e. if you chunk it by 500 records, first invocation will read 500 records, next will read 1,000 records, but only return 500, then we read 1,500 records and again only return 500, etc.).

Also, note that the extraction does not impose any ordering, so the actual data returned is not well defined - at least in theory. In practice the database most likely will pick one execution plan and stick to it, so we should get all table rows.

Now this is the relevant code from the FM.

  SELECT * FROM (QUERY_TABLE) INTO <WA> WHERE (OPTIONS).
    IF SY-DBCNT GT ROWSKIPS.
*   copy all relevant fields into DATA (output) table
    ...
*   end of loop at FIELDS_INT
      APPEND DATA.
      IF ROWCOUNT > 0 AND SY-DBCNT GE ROWCOUNT. 
        EXIT.
      ENDIF.
    ENDIF.
  ENDSELECT.<br>

So what harald.boeing2 wrote can be seen to be correct. It's slow and since there's no ORDER BY on the SELECT, there is no guaranteed order on the database, so you may well get different results.

You can't use ROWCOUNT/ROWSKIPS to partition the data. If you know the key values you could use that. E.g. for UST04, first read all the BNAME's beginning with A, then B ...

Answers (1)

Answers (1)

Ulrich_Schmidt
Product and Topic Expert
Product and Topic Expert

To make a long story short:

Don't use RFC_READ_TABLE.

It's not a solution, it's a security vulnerability... In fact, SAP recommends to disable this FM in a productive system or at least limit the tables this FM is allowed to read (see SAP note 2246160). It's a relic from very old times, when there was no internet and hackers did not yet target SAP systems, which were running well-protected and isolated somewhere in the basement... Unfortunately SAP was not able to delete this FM, as "everybody" is using it in mission critical productive scenarios, because it is just "so convenient"...

But the shortcomings are obvious:

  • It allows generic access to any database table, bypassing the application layer authorization checks! So a user with authorization for RFC_READ_TABLE might be able to read data, which he shouldn't be allowed to see... There are a few workarounds available to limit this danger, but it remains patchwork.
  • It has a generic WHERE-clause, which like all generic WHERE-clauses is prone to SQL injection attacks. And this SQL injection attack will then run under the SAP system's database user, which usually has admin privileges on the database...
  • The output is only char-based and does not support all datatypes like INT and FLOAT. Even if your output is only char-based, you will run into problems when trying to read non-ISO-Latin-1 data via Unicode RFC connection. E.g. if a table contains European and Chinese texts, it is not so easy to interpret the "big char blob" that you get and find the correct offsets and lengths, where one field ends and the next starts... One subtle mistake, and you end up with "data salad"...
  • Performance is bad and the output "unpredictable" when trying to read in batches (as we see here). What works ok on an Oracle, may lead to duplicates on a Microsoft SQL Server or vice versa...

If you have dozens or even hundreds of tables you need to read, then RFC_READ_TABLE is probable still the "least effort". But if there are only 2-3 tables you need, then go the extra mile and check for each table:

  1. If there is a standard SAP BAPI that returns the data of that table, then use that.
  2. If not, then write your own RFM, which reads precisely this one table in the format you need. This FM then
    a) can include a proper authorization check for the data.
    b) doesn't allow potential hackers to read other tables that you don't want to expose
    c) doesn't need a generic WHERE-clause and therefore doesn't allow SQL injection (if you do your job right...)
    d) returns the output in a well-defined structure, so you don't need to parse and interpret table-metadata/field definitions and break down a big blob into the original columns...