Application Development Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 

Mass data processing to compare addresses

Stef3
Explorer
0 Kudos
620

Hello SAP Experts,

I have to compare just arround two million of address data. At the moment I read my data in the local table it_adrc and prepare the comparision in a method before. I create a method that seperates the data into batches and then calls a different method ( zif_data_analyzer~process_batch ) to do the comparison. But now I have encountered a small problem in that all the data is not compared and only part of it appears in the results table. This is because only 10,000 are viewed per batch. Does anyone have an idea how to solve this? Below is my current code of the part of the data processing. I would be gratefully for any recomandation of solving this problem

 

 " Copy the input tabelle
    lt_adrc_outer_loop[] = it_adrc[].
    lt_adrc_inner_loop[] = it_adrc[].

    " Sort after city
    SORT lt_adrc_outer_loop BY city1.
    SORT lt_adrc_inner_loop BY city1.

    DATA: lt_batch_addresses   TYPE TABLE OF adress_struc,
          lt_batch_result      TYPE TABLE OF result_struc,
          lv_offset            TYPE sy-tabix,
          lv_batch_size        TYPE i,
          lv_total_records     TYPE i,
          lv_remaining_records TYPE i,
          lv_index             TYPE i.

    " Batch-size definition
    CONSTANTS: c_batch_size TYPE i VALUE 10000. " Example: 10.000  each paket

    " Get amount of all data
    lv_total_records = lines( it_adrc ).

    " startindex and residual records initialisation
    lv_offset = 1.
    lv_remaining_records = lv_total_records.

    " Loop for processing packets
    WHILE lv_remaining_records > 0.
      " Set batch size based on remaining records
      lv_batch_size = c_batch_size.
      IF lv_remaining_records < c_batch_size.
        lv_batch_size = lv_remaining_records.
      ENDIF.

      " Select data for the current package
      CLEAR lt_batch_addresses.

      DATA: lv_index_act    TYPE i,
            lv_index_delete TYPE i.
      CLEAR lt_batch_addresses.
      CLEAR lt_batch_addresses.
      " Copy the first 10,000 records into lt_batch_addresses
      lv_index_act = 0.
      lv_index_delete = 0.
      DO 10000 TIMES.
        lv_index_act = lv_index_act - 1.
        ADD 1 TO lv_index_delete.
        IF lv_index_act < 10000.
          READ TABLE lt_adrc_outer_loop INDEX lv_index_delete INTO ls_adr_data.
          IF sy-subrc = 0.
            " The access was successful, the element exists
            APPEND lt_adrc_outer_loop[  lv_index_delete ] TO lt_batch_addresses.
            "lt_batch_addresses[ lv_index_delete ] = lt_adrc_outer_loop[ lv_index_delete ].
            DELETE lt_adrc_outer_loop INDEX lv_index_delete.
          ENDIF.


        ELSE.
          EXIT. "When all records have been copied, exit the loop
        ENDIF.
      ENDDO.


      " Call method to process the current package
      lt_batch_result = me->zif_data_analyzer~process_batch(
      it_adrc = lt_batch_addresses ).

      " Write results of current package to rt_result
      APPEND LINES OF lt_batch_result TO rt_result.

      " Update index for next package
      lv_offset = lv_offset + lv_batch_size.
      lv_remaining_records = lv_remaining_records - lv_batch_size.
    ENDWHILE.

 

 

3 REPLIES 3

raymond_giuseppi
Active Contributor
0 Kudos
585

What do you mean by comparing addresses?

For example, if you're looking for duplicates, you can't work on simple batches of 10,000 addresses, but you must compare each batch either to all the addresses, or to itself and all the other batches (Warning: if there are a million records, each of the 100 batches of 10,000 addresses must be compared to the 99 other batches and to itself, a double loop on the table, with 100 x 100 batch pair processing). You can reduce this by only comparing to itself and following batches of addresses (10000 > 5050)

0 Kudos
505

I already found now a solution by making packages per city.

0 Kudos
501

Nice idea (country/city)