Application Development and Automation Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Process files with huge amount of data using parallel processing

Former Member
0 Likes
2,916

Hi,

My Requirement is to process a file with huge amount of data and update database table ( std or z-table).There are more than 5 million records in each file.

I think parallel processing is one of the best approach.but my question is how to handle 5million recrds at the internal table level.I want to split the data at internal table/File level for each 50-75k records and process this data using parallel processing.

Pls suggest the best approach?

1 ACCEPTED SOLUTION
Read only

Former Member
0 Likes
1,266

Keep your large files on the application server.

Provide fields on the load program's selection screen to specify the line number range. Let us say the fields are p_start and p_end.

In the program:


- num_lines = p_end - p_start
- open dataset
- do p_start times.
-   read dataset "just skipping these rows; do not append the itab
- enddo
- do num_lines times.
-   read dataset
-   append itab
- enddo
- insert from itab

Then run your load program in parallel by specifying the following values for p_start and p_end.

0 - 50000

50001 - 100000

100001 - 150000

and so on.

5 REPLIES 5
Read only

Former Member
0 Likes
1,266

Doesn't matter if you have 100 processes in parallel, you can only lock the same DB one at a time. So, other processes will always be waiting.

I would create a main program which spawns 10 or 20 (or 50 or 100) sub-programs to process the data.

for instance, in your main program, upload all the recs into an ITAB. divide up the records and send them off to be processed.

We have a Time splitter program that will generate up to 50 jobs - BUT only 10 are processing at any one time. we've found that any more jobs will cause a bottleneck in the system.

Read only

0 Likes
1,266

Hi Robert,

Thank you very much for quick response.

for instance, in your main program, upload all the recs into an ITAB. divide up the records and send them off to be processed.

How to divide up the records in the internal table?

do u have any sample code related to this requirement?

It would be a helpful if you can send me the related documents including time splitter program.

you can find my ID in the business card as I cant provide you here.

Regards,

SRini

Read only

0 Likes
1,266
How to divide up the records in the internal table?

you would find out how many lines are in your itab by using the DESCRIBE statement.

let's say there are 150,000 records and you know you want to split them up into 10,000 record batches.

you can:


rec1 = 1.
recN = 10000.
do 15 times.
loop at itab from rec1 to recN.
 append lines to a 2nd itab until you reach the Nth record.
 create & submit your spawn job.
 rec1 = recN + 1.
 recN = rec1 + 10000.
endloop.
enddo.

I am not at liberty to give up our code but I am happy to share ideas.

Read only

Former Member
0 Likes
1,267

Keep your large files on the application server.

Provide fields on the load program's selection screen to specify the line number range. Let us say the fields are p_start and p_end.

In the program:


- num_lines = p_end - p_start
- open dataset
- do p_start times.
-   read dataset "just skipping these rows; do not append the itab
- enddo
- do num_lines times.
-   read dataset
-   append itab
- enddo
- insert from itab

Then run your load program in parallel by specifying the following values for p_start and p_end.

0 - 50000

50001 - 100000

100001 - 150000

and so on.

Read only

0 Likes
1,266

Thanks Robert and Sudhir....