cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Data Load Performance

Former Member
0 Likes
1,993

Hi

We're having trouble with getting large volumes of data into some of our cubes.

1 of our cubes (Listing) takes around 01h45 to load about 20 million records.

Another cube (Vendor) takes around 00h30 to load 36 million records.

Most of the time, in both scenarios, is taken up with SID generation ... is there a way to speed this up?

In the Listing scenario, the load to PSA has been split into about 15 infoPackages. The total time taken to run all of those infopackages (in 3 parallel streams) is about 15 minutes.

In the Vendor scenario, a single infopackage is used to load to the PSA, and takes around 2 hours to load into the PSA

The following picture shows one of the DTP requests for the Listing scenario:

For this example 3:16 for datapackage 8, of which 2:29 were taken up for the SID step.

Not sure why this takes so long, and if there's anything to do to speed it up.

A similar screenshot of the Vendor DTP:

That looks like similar performance to the listing DTP. However, when I look at the later requests in the Vendor DTP, they suddenly go a whole lot quicker towards the end:

I guess my questions are:

1. How do I speed up the SID generation

2. Why does SID generation get quicker when more & more records are processed, and why would that not apply in the Listing scenario when there are multiple PSA requests to load

3. Why would an infopackage take so much longer (36 million / 1.5 hours) when running as single infopackage compared to smaller packages (15 minutes in parallel / 20 million records)  --> This one I can sort of understand, that the multiple smaller packets are better, but it then seems to harm the DTP performance.

Cheers,

Andrew

Accepted Solutions (0)

Answers (8)

Answers (8)

abdullahqureshi
Contributor
0 Likes

Hi Andrew,

1. In DTP setting for batch manager increase your no of parallel process and change its pririoty to 'A'.
About no. of parallel process check this thread: http://scn.sap.com/thread/1654725

2. Load the data in parts by filter restrictions of DTP.

3. Delete index from Cube before loading  and create index when data load completed.

Regards,
Abdullah

Former Member
0 Likes

Hi Andrew,

In addition to the above posts change the settings in DSO and split the loads using selective loading.

Thanks,

Purushotham.

KamalMehta
Product and Topic Expert
Product and Topic Expert
0 Likes

Firstly ,In order to speed up the SID Generation you need to go for Number Range Buffering .

Secondly you should take in to consideration the various settings related to Background(RSBATCH) , DTP settings in terms of Parallel Processing ,Data Packet Size etc. for identifying the various reasons related to extraction time.

Thanks

Kamal

Former Member
0 Likes

Hi Andrew,

what I can recognise is that you have a huge amount of data under your dimensions. So why not to try using dimensions (for which we have a big amount of data) with flag "Line Item dim" and maybe "High cardinality" too??

Try it and let me informed please.

Regards

Salah

siddharth_wagh
Discoverer
0 Likes

Hi Andrew,

During the transactional data load, each record goes to database and pick new DIM-ID.

Since we have huge amount of data, the performance of the loading will decrease. Because all

the records will go to database table and gets new DIM ID numbers every time.

So in order to rectify this problem, we need to use ‘Buffered Numbers’ rather than the hitting the database every time. Follow below steps:

1)  Go to SE37 & Put the Function Module RSD_CUBE_GET to find the object name of a dimension 

     that is likely to have a high number of data sets.

2)  Press F8 and enter the following in function module settings:

      • I_INFOCUBE = 'Info Cube Name'

      • I_OBJVERS = 'A'

      • I_BYPASS_BUFFER = 'X'

      • And Execute

3) In the result screen:

   The number of dimensions are contained in table 'E_T_DIME'. Double click on it to see the   

   dimensions.

4) Go to Column “NOBJECT”, you will get all the relevant number ranges (for example BID0002145)

5) Use Transaction SNRO to display the number range for a dimension used in BI.

    Go to SNRO t-code -> enter the number range object -> click on the „Change‟ Button.

    Then you will get the Number Range Object Maintenance Screen.

6) Now Choose Edit -> Set-up buffering -> Main memory

    Define the 'No. of numbers in buffer' in Number Range Object Change Screen .Set this value to

    500, for example. The size depends on the expected data quantity in the initial and in future

    (delta) uploads.

7) Test your data load. If not improved, try to increse the size of range buffer and re-test.

Siddharth

Former Member
0 Likes

Hi,

3 tips:

1. Increase DTP Setting for batch processing, you can increase it till 9.

2. Check if you can reduce amount of data to be loaded by putting relevant filters in DTP. Sometimes we load more then what is actually required by business.

3. Data Package: Reduce data package and also try splitting your DTP by say for example fiscal year, fiscal period.....when lot of ABAP Code is involved in start routine and end routine, you can play around with data package size and try to find out one suitable to your scenerio. When lot of ABAP Code is involved I have seen data package size 1000 or lesser then it.

Former Member
0 Likes

Try to increase parallel process in RSBATCH.

Former Member
0 Likes

Hi Andrew,

Additionnaly to Suman proposals, don't gorget to delete your Infocube indexes before loading data into it.

Also if you are using a lot of routines, lookups and formulas in your transformation. Ask abap consultants to optimize your codes.

Amine

former_member182470
Active Contributor
0 Likes

1. How do I speed up the SID generation

Check the reason for long taking SID generation in this link http://scn.sap.com/thread/3373545

2. Why does SID generation get quicker when more & more records are processed, and why would that not apply in the Listing scenario when there are multiple PSA requests to load

A PSA which has 15 requests will again process by splitting into several data packages by based on your parallel BGD settings , Package size and Semantic Keys in the DTP definition. Try to understand the difference in picking data from multiple PSA requests and again splitting into multiple packages by based on above mentioned settings. This process will definitely take time than picking data from a single PSA request( big volume of records) and process them in DTP.

3. Why would an infopackage take so much longer (36 million / 1.5 hours) when running as single infopackage compared to smaller packages (15 minutes in parallel / 20 million records)  -

Splitting into multiple infopackages will improve your extraction time but may take some time to process internally via DTP. If your Infocube or DSO design is proper, then we can optimize DTP processing time too.

Hope I have answered your queries.

Regards,

Suman