Solved: Process files with huge amount of data using paral...

Former Member · ‎2008 Jun 02

Hi,

My Requirement is to process a file with huge amount of data and update database table ( std or z-table).There are more than 5 million records in each file.

I think parallel processing is one of the best approach.but my question is how to handle 5million recrds at the internal table level.I want to split the data at internal table/File level for each 50-75k records and process this data using parallel processing.

Pls suggest the best approach?

Former Member · ‎2008 Jun 02

Keep your large files on the application server.

Provide fields on the load program's selection screen to specify the line number range. Let us say the fields are p_start and p_end.

In the program:


- num_lines = p_end - p_start
- open dataset
- do p_start times.
-   read dataset "just skipping these rows; do not append the itab
- enddo
- do num_lines times.
-   read dataset
-   append itab
- enddo
- insert from itab

Then run your load program in parallel by specifying the following values for p_start and p_end.

0 - 50000

50001 - 100000

100001 - 150000

and so on.

Former Member · ‎2008 Jun 02

Doesn't matter if you have 100 processes in parallel, you can only lock the same DB one at a time. So, other processes will always be waiting.

I would create a main program which spawns 10 or 20 (or 50 or 100) sub-programs to process the data.

for instance, in your main program, upload all the recs into an ITAB. divide up the records and send them off to be processed.

We have a Time splitter program that will generate up to 50 jobs - BUT only 10 are processing at any one time. we've found that any more jobs will cause a bottleneck in the system.

Former Member · ‎2008 Jun 02

Hi Robert,

Thank you very much for quick response.

for instance, in your main program, upload all the recs into an ITAB. divide up the records and send them off to be processed.

How to divide up the records in the internal table?

do u have any sample code related to this requirement?

It would be a helpful if you can send me the related documents including time splitter program.

you can find my ID in the business card as I cant provide you here.

Regards,

SRini

Former Member · ‎2008 Jun 02

How to divide up the records in the internal table?

you would find out how many lines are in your itab by using the DESCRIBE statement.

let's say there are 150,000 records and you know you want to split them up into 10,000 record batches.

you can:


rec1 = 1.
recN = 10000.
do 15 times.
loop at itab from rec1 to recN.
 append lines to a 2nd itab until you reach the Nth record.
 create & submit your spawn job.
 rec1 = recN + 1.
 recN = rec1 + 10000.
endloop.
enddo.

I am not at liberty to give up our code but I am happy to share ideas.

Former Member · ‎2008 Jun 02

Keep your large files on the application server.

Provide fields on the load program's selection screen to specify the line number range. Let us say the fields are p_start and p_end.

In the program:


- num_lines = p_end - p_start
- open dataset
- do p_start times.
-   read dataset "just skipping these rows; do not append the itab
- enddo
- do num_lines times.
-   read dataset
-   append itab
- enddo
- insert from itab

Then run your load program in parallel by specifying the following values for p_start and p_end.

0 - 50000

50001 - 100000

100001 - 150000

and so on.

Former Member · ‎2008 Jun 02

Thanks Robert and Sudhir....

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My Account

My Account

Process files with huge amount of data using parallel processing