2024 Mar 26 11:51 AM
Hello SAP Experts,
I have to compare just arround two million of address data. At the moment I read my data in the local table it_adrc and prepare the comparision in a method before. I create a method that seperates the data into batches and then calls a different method ( zif_data_analyzer~process_batch ) to do the comparison. But now I have encountered a small problem in that all the data is not compared and only part of it appears in the results table. This is because only 10,000 are viewed per batch. Does anyone have an idea how to solve this? Below is my current code of the part of the data processing. I would be gratefully for any recomandation of solving this problem
" Copy the input tabelle
lt_adrc_outer_loop[] = it_adrc[].
lt_adrc_inner_loop[] = it_adrc[].
" Sort after city
SORT lt_adrc_outer_loop BY city1.
SORT lt_adrc_inner_loop BY city1.
DATA: lt_batch_addresses TYPE TABLE OF adress_struc,
lt_batch_result TYPE TABLE OF result_struc,
lv_offset TYPE sy-tabix,
lv_batch_size TYPE i,
lv_total_records TYPE i,
lv_remaining_records TYPE i,
lv_index TYPE i.
" Batch-size definition
CONSTANTS: c_batch_size TYPE i VALUE 10000. " Example: 10.000 each paket
" Get amount of all data
lv_total_records = lines( it_adrc ).
" startindex and residual records initialisation
lv_offset = 1.
lv_remaining_records = lv_total_records.
" Loop for processing packets
WHILE lv_remaining_records > 0.
" Set batch size based on remaining records
lv_batch_size = c_batch_size.
IF lv_remaining_records < c_batch_size.
lv_batch_size = lv_remaining_records.
ENDIF.
" Select data for the current package
CLEAR lt_batch_addresses.
DATA: lv_index_act TYPE i,
lv_index_delete TYPE i.
CLEAR lt_batch_addresses.
CLEAR lt_batch_addresses.
" Copy the first 10,000 records into lt_batch_addresses
lv_index_act = 0.
lv_index_delete = 0.
DO 10000 TIMES.
lv_index_act = lv_index_act - 1.
ADD 1 TO lv_index_delete.
IF lv_index_act < 10000.
READ TABLE lt_adrc_outer_loop INDEX lv_index_delete INTO ls_adr_data.
IF sy-subrc = 0.
" The access was successful, the element exists
APPEND lt_adrc_outer_loop[ lv_index_delete ] TO lt_batch_addresses.
"lt_batch_addresses[ lv_index_delete ] = lt_adrc_outer_loop[ lv_index_delete ].
DELETE lt_adrc_outer_loop INDEX lv_index_delete.
ENDIF.
ELSE.
EXIT. "When all records have been copied, exit the loop
ENDIF.
ENDDO.
" Call method to process the current package
lt_batch_result = me->zif_data_analyzer~process_batch(
it_adrc = lt_batch_addresses ).
" Write results of current package to rt_result
APPEND LINES OF lt_batch_result TO rt_result.
" Update index for next package
lv_offset = lv_offset + lv_batch_size.
lv_remaining_records = lv_remaining_records - lv_batch_size.
ENDWHILE.
2024 Mar 26 1:49 PM - edited 2024 Mar 26 2:52 PM
What do you mean by comparing addresses?
For example, if you're looking for duplicates, you can't work on simple batches of 10,000 addresses, but you must compare each batch either to all the addresses, or to itself and all the other batches (Warning: if there are a million records, each of the 100 batches of 10,000 addresses must be compared to the 99 other batches and to itself, a double loop on the table, with 100 x 100 batch pair processing). You can reduce this by only comparing to itself and following batches of addresses (10000 > 5050)
2024 Mar 27 3:30 PM
2024 Mar 27 3:37 PM