on 2025 Feb 26 4:25 PM
Hello All,
I want to extract approx. 20 billion records from an SAP table and put the files in GCS using SAP BODS. There are 2 approaches I have been exploring. One where in I make a use of normal Dataflows that can be executed in parallel and the next where in I use a single ABAP dataflow. Both are currently working for only a small subset of data.
1) With Parallel Dataflow:
- I need to Manually define filters for 10 Dataflows that I am using inside my workflow. This seems to fail if any of the ranges get more than an expected limit of data. So this is not error prone. Also Manual involvement to define filters make the task a bit more tedious and open to errors.
2) With ABAP DF:
- I am facing an issue as I keep getting the error disk space at the given EAI location gets exhausted after a couple of hours. This again brings the need to manual put filters so our a specific subset of data can be processed.
I was able to observe that ABAP DF (since it runs in the background) definitely able to process more records compared to normal dataflow with parallel execution (this errors out due to RFC connection limit set by basis) but both of these solutions required manual interference.
How to Optimize and make it error prone:
1) The question is around how can I remove the manual value assignments on my global variables for a columns that has a something is not spread out evenly or proportionately. so each filters differs.
2) Also Is there a way to optimize the solution so there will be a lesser manual interference with the job
3) Can we create a job that utilizes both approach in a way something like we create multiple ABAP DF with each created with a DF and then replicate it to multiple DFs.
Thanks,
Avinash
Request clarification before answering.
To avoid running out of disk space, you should try to design the job in such a way that you don't have to save the data locally on the Job Server machine.
Second to avoid running out of RAM, you should also design the job in such a way that you don't have to load much of the data in memory.
Other ideas i would recommend looking into is to partition the source and target to allow parallel reads and loads to increase performance and possibly avoid the timeout issues.
Avoid doing a lot of transformations in your job.
Try increasing the RFC timeout to increase avoid RFC timeouts.
A good place to start is here Performance Optimization Guide
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 8 | |
| 7 | |
| 6 | |
| 5 | |
| 4 | |
| 4 | |
| 4 | |
| 3 | |
| 3 | |
| 3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.