Application Development Blog Posts
Learn and share on deeper, cross technology development topics such as integration and connectivity, automation, cloud extensibility, developing at scale, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 
JerryWang
Product and Topic Expert
Product and Topic Expert
0 Kudos
3,617

Recently we are handling with one customer complaint about performance issue in their production system. We are investigating from both ABAP side and DB side to seek potential improvement. Regarding DB side, we are working together with customer on possible DB index optimization.

On the other hand, we observed the database tables which causes the performance pain have a huge number of records ( among 50 ~ 200+ millions ). We do see the possibility to achive some history data, however we are not sure whether customer could get the performance gain after achiving.

So I did a small experiment to try to get the draft idea about the relationship between table record number and performance.

Test environment and test data preparation

Database: HANA DB with release 1.00.74.00.390550. The test is done on our internal system. It makes more sense to test in customer's system since customer is currently using a NON_HANA DB but unfortunately I cannot achieve it - I need to write several reports to generate the massive test data and my request to ask for a user in their sandbox system didn't get approved. I hope finally I could have the chance to repeat this experiment on customer sandbox system.

The table causing performance issue is PPFTTRIGG, which is a standard table in SAP CRM, storing transaction data of action framework processing detail.

In customer system it has 75 million records. In ST12 its DB time always ranks the first in trace result.

For the table itself:

Red color: Key field

Blue color: Index field

Yellow color: Non index field

I copy the standard table into ten new Z table with exactly the same technical settings, and fill each Z table with 10 millons ~ 100 millons records.

For field os_guid, appl_oid and medium_oid, I use function module GUID_CREATE to generate new guid and assign to them.

For other field like applctn, context and ttype, I read all possible values from their value table and assign to table records evenly.

Table name

ZPPFTTRIGG1

ZPPFTTRIGG2

ZPPFTTRIGG3

ZPPFTTRIGG4

ZPPFTTRIGG5

ZPPFTTRIGG6

ZPPFTTRIGG7

ZPPFTTRIGG8

ZPPFTTRIGG9

ZPPFTTRIGG10

Table record

10 millons

20 millons

30 millons

40 millons

50 millons

60 millons

70 millons

80 millons

90 millons

100 millons

The following measurement are done against the ten Z tables. All the time recorded in this blog is measured in millisecond by default.

Read via Index field, single record returned for each table

I use index field appl_oid, which could ensure only unique record returned for each Z table. The time is measured in microsecond.

I test it repeatedly and the result always shows that the time does not increase linearly according to the number of table records.

Read via table key field OS_GUID

The time is measured in microsecond.

The result shows it is on average a little faster to read via table key field compared with reading via index field.

Read via Non-index fields, one record returned

In this case I query the ten tables via non-index field MEDIUM_OID. Each query only returns one result for each Z table. The time is measured in microsecond.

All three kinds of read operation could ensure the unique record returned by SQL statement, sorting based on efficiency: read via key > read via index field > read via non-index field

Read via Index fields, multiple and different number of records returned for each table

The read is performed on index field applctn and context.

The X axis represents the number of records returned via SELECT * INTO TABLE <record set> FROM <ztable > WHERE applctn = XX AND context = XXX for each Z table.

Read via Index fields, multiple and fixed number of records returned

Similar as test above, but added UP TO 6000 rows to force the SQL on each Z table always returned the fixed number of records.

Read via Non-index fields, multiple and fixed number of records returned

I perform the read to retrieve all records for each table which has flag is_changed marked as abap_true. Before testing, I manually change the table entries for each Z table to ensure all ten tables have the exactly the same number of records with is_changed marked as abap_true.

Table entry mass update

The update operation is done on the non-index field is_changed. First I use SELECT UP TO XX to retrieve the given record set which has is_changed = abap_true, and then use UPDATE <Z table> FROM TABLE <record set> to update the table. The execution time of statement UPDATE <Z table> FROM TABLE <record set> is measured:

when doing mass change on the ten tables one by one in the same session, <record set> contains 10000 entries to be updated:

before my testing, my assumption is that the consuming time for updating will increase linearly according to numbers of table records.

However it seems according to this test result that the number of table records will not degrade the update performance at least in HANA DB.

Summary

Based on this test result, it seems in HANA DB, there would be no significant performance improvement for read operation which only returns a unique record, even if number of table records reduces dramatically( reduction from 100 millions to 10 millions ). The same holds true for update operation via UPDATE <DB table> FROM TABLE <record set>.

For read operation with multiple records returned, no matter index or non-index fields is used, fixed or different number of records are returned, the time consumed for read always increses almost linearly with number of table entries. In this case customer could still get the benefit from table archiving.

Labels in this area