on 2013 Jun 25 8:56 AM
Hi
We're having trouble with getting large volumes of data into some of our cubes.
1 of our cubes (Listing) takes around 01h45 to load about 20 million records.
Another cube (Vendor) takes around 00h30 to load 36 million records.
Most of the time, in both scenarios, is taken up with SID generation ... is there a way to speed this up?
In the Listing scenario, the load to PSA has been split into about 15 infoPackages. The total time taken to run all of those infopackages (in 3 parallel streams) is about 15 minutes.
In the Vendor scenario, a single infopackage is used to load to the PSA, and takes around 2 hours to load into the PSA
The following picture shows one of the DTP requests for the Listing scenario:
For this example 3:16 for datapackage 8, of which 2:29 were taken up for the SID step.
Not sure why this takes so long, and if there's anything to do to speed it up.
A similar screenshot of the Vendor DTP:
That looks like similar performance to the listing DTP. However, when I look at the later requests in the Vendor DTP, they suddenly go a whole lot quicker towards the end:
I guess my questions are:
1. How do I speed up the SID generation
2. Why does SID generation get quicker when more & more records are processed, and why would that not apply in the Listing scenario when there are multiple PSA requests to load
3. Why would an infopackage take so much longer (36 million / 1.5 hours) when running as single infopackage compared to smaller packages (15 minutes in parallel / 20 million records) --> This one I can sort of understand, that the multiple smaller packets are better, but it then seems to harm the DTP performance.
Cheers,
Andrew
Request clarification before answering.
Hi Andrew,
1. In DTP setting for batch manager increase your no of parallel process and change its pririoty to 'A'.
About no. of parallel process check this thread: http://scn.sap.com/thread/1654725
2. Load the data in parts by filter restrictions of DTP.
3. Delete index from Cube before loading and create index when data load completed.
Regards,
Abdullah
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Andrew,
In addition to the above posts change the settings in DSO and split the loads using selective loading.
Thanks,
Purushotham.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Firstly ,In order to speed up the SID Generation you need to go for Number Range Buffering .
Secondly you should take in to consideration the various settings related to Background(RSBATCH) , DTP settings in terms of Parallel Processing ,Data Packet Size etc. for identifying the various reasons related to extraction time.
Thanks
Kamal
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Andrew,
what I can recognise is that you have a huge amount of data under your dimensions. So why not to try using dimensions (for which we have a big amount of data) with flag "Line Item dim" and maybe "High cardinality" too??
Try it and let me informed please.
Regards
Salah
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Andrew,
During the transactional data load, each record goes to database and pick new DIM-ID.
Since we have huge amount of data, the performance of the loading will decrease. Because all
the records will go to database table and gets new DIM ID numbers every time.
So in order to rectify this problem, we need to use ‘Buffered Numbers’ rather than the hitting the database every time. Follow below steps:
1) Go to SE37 & Put the Function Module RSD_CUBE_GET to find the object name of a dimension
that is likely to have a high number of data sets.
2) Press F8 and enter the following in function module settings:
• I_INFOCUBE = 'Info Cube Name'
• I_OBJVERS = 'A'
• I_BYPASS_BUFFER = 'X'
• And Execute
3) In the result screen:
The number of dimensions are contained in table 'E_T_DIME'. Double click on it to see the
dimensions.
4) Go to Column “NOBJECT”, you will get all the relevant number ranges (for example BID0002145)
5) Use Transaction SNRO to display the number range for a dimension used in BI.
Go to SNRO t-code -> enter the number range object -> click on the „Change‟ Button.
Then you will get the Number Range Object Maintenance Screen.
6) Now Choose Edit -> Set-up buffering -> Main memory
Define the 'No. of numbers in buffer' in Number Range Object Change Screen .Set this value to
500, for example. The size depends on the expected data quantity in the initial and in future
(delta) uploads.
7) Test your data load. If not improved, try to increse the size of range buffer and re-test.
Siddharth
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
3 tips:
1. Increase DTP Setting for batch processing, you can increase it till 9.
2. Check if you can reduce amount of data to be loaded by putting relevant filters in DTP. Sometimes we load more then what is actually required by business.
3. Data Package: Reduce data package and also try splitting your DTP by say for example fiscal year, fiscal period.....when lot of ABAP Code is involved in start routine and end routine, you can play around with data package size and try to find out one suitable to your scenerio. When lot of ABAP Code is involved I have seen data package size 1000 or lesser then it.
Try to increase parallel process in RSBATCH.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Andrew,
Additionnaly to Suman proposals, don't gorget to delete your Infocube indexes before loading data into it.
Also if you are using a lot of routines, lookups and formulas in your transformation. Ask abap consultants to optimize your codes.
Amine
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
1. How do I speed up the SID generation
Check the reason for long taking SID generation in this link http://scn.sap.com/thread/3373545
2. Why does SID generation get quicker when more & more records are processed, and why would that not apply in the Listing scenario when there are multiple PSA requests to load
A PSA which has 15 requests will again process by splitting into several data packages by based on your parallel BGD settings , Package size and Semantic Keys in the DTP definition. Try to understand the difference in picking data from multiple PSA requests and again splitting into multiple packages by based on above mentioned settings. This process will definitely take time than picking data from a single PSA request( big volume of records) and process them in DTP.
3. Why would an infopackage take so much longer (36 million / 1.5 hours) when running as single infopackage compared to smaller packages (15 minutes in parallel / 20 million records) -
Splitting into multiple infopackages will improve your extraction time but may take some time to process internally via DTP. If your Infocube or DSO design is proper, then we can optimize DTP processing time too.
Hope I have answered your queries.
Regards,
Suman
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 8 | |
| 6 | |
| 4 | |
| 3 | |
| 3 | |
| 3 | |
| 3 | |
| 2 | |
| 2 | |
| 2 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.