Hello Readers,
Purpose of this Blog :
In our last blog we went through some commonly used platform transforms in bods, you can check that out on below link
https://blogs.sap.com/2022/11/04/sap-bods-commonly-used-platform-transforms/
Today, in this blog i will cover the data integration transform Table comparison with Map operation in SAP BODS which is widely used to handle many data management scenarios.
What’s in it for you?
Table comparison and map operation together are extensively used to handle Slowly Changing Dimensions (SCD) scenarios, duplicate data issue etc
Lets deep dive into the table comparison and map operation -
General Desc-
Table comparison transform is used to compare 2 data sets based on some primary columns and generate opcodes (Insert, update or delete) based on comparison, These opcodes then are controlled using Map Operation to take the necessary/relevant action for those records based on user requirements.
Implementation-
To implement Table transform and map operation in BODS, use below instructions –
- Pull the source table with some relevant data
- Use query transform to pull the required fields
- Drag Table Comparison transform from data integration and map operation from platform transforms using local object library->transformations section.
- Create a permanent table as target table and connect as output to map transform.
Table Comparison and Map Operation DF Design
- Now double click on table comparison transform to setup different options
- First you need to provide comparison table in table name, you can select using your available datastores from dropdown window
- Select the comparison method best suited for your requirement, comparison method plays significance in performance tuning for this transform. below are some details about each method
- Row by Row Select - This comparison method follows linear search technique to compare records, it can be used with any number of records coming from source
- Cached Comparison - This comparison method uses cache memory to compare the records and this is suitable for smaller number of records coming from source as cache memory is generally small in size
- Sorted Input - This comparison method follows binary search technique to compare records, it can be used with any number of records coming from source. This method requires input data to be sorted based on columns you are using as a part of primary column
Note : Out of above comparison methods, most widely used is sorted input because of its performance, cached comparison should only be used when you have smaller comparison tables.
- Then provide input primary columns based on which record will be identified and provide compare columns which you want to compare values for.
The setup of table comparison transform to cover below cases would look like this-
Table Comparison Setup1
Table Comparison 2
I will cover below few basic scenarios to understand the working of these 2 transforms.
- Identifying and working with new records
- Identifying existing records and handling the same
Details :
- Identifying and working with new records -
Consider a below source data and target data.
Source Data
Target Data
In this scenario,
- Table comparison compares the source data with comparison table and identifies the records as insert as they are not present in the comparison table
- These records are considered as new record and opcode with letter "I" will be used to denote these
- In the next map operation can be used to handle this insert data scenario, we have 5 options to use from the list for opcode identified (see screenshot)
- Using map operation options for a generated opcode you can actually decide whether you want to insert, update or delete records. In our case we have used insert to insert so that it can insert new records
Map Operation Setup
If you want to see generated opcodes, you would need to execute your job in debug mode then you can actually see output of table comparison transform and how it generates opcodes. (see below screenshot after job execution)
Insert Scenario
After Execution - Source and Target data ouput
Now Lets deep dive into 2nd scenario where we will see how we can handle already existing records.
Here we will try to do two things -
a. We will update the existing records and
b.We will insert our updates as a new record
- We will keep the table comparison setup same as above case.
- We have updated 3rd row in our source data, we have changed emp_city and emp_country (see screenshot)
- Now 3rd row will be marked with opcode update "U" and we will use map operation to handle the update opcode
- First we will set update to update in map operation and check the ouput, refer below screenshots before and after execution of job.
Before Job execution source and Target data- We updated the 3rd row.
After Job execution- Source and Target data
Table Comaprison output for existing record
Source and Target data Update to update
If you can see in above screenshot, target data row 3 got updated with new values for emp_city and emp_country.
If we want, we can insert the updates as a new record. In that case, we would set update to insert in map operation. checkout before and after snaps of data below.
Before job execution -
We again updated the source table 3rd row to new values, now we will insert this update as a new record in target thus target will have now 4 records.
Source and Target data
After Job execution-
If you will notice now target has 4 records basically 3rd row is repeating with old and new values.
Source and Target data after update to insert
Note : Important thing to remember, If you have primary keys defined on your target then update to insert scenario will require new key to be generated to prevent unique key constraint violation. This can be achieved using Key generation transform.
Key points :
- Table comparison and map operation can together used to handle such data operations with variety of data scenarios
- These are widely used to implement slowly changing dimension concept (SCD) with the help of history preserving and key generation transforms
- Map operation can be very effective to handle data inserts, updates or delete
- As we went through insert and update scenario, no changes or deletion scenarios can also be handled in similar way.
Other options -
Input has duplicate keys - This is important option when you have possible duplicate data in source. This option works as below.
Update Scenario -
Lets assume your input data is duplicated and alrteady in target i.e. existing data scenario. In this case there are 2 possibilities -
- If this option is checked - The first incoming record will be identified as update (U) as data is already existing and as this option is checked the table comparison transform will automatically mark the duplicate records as update (U) so all records will be marked as update
- If option is not checked - Now what will happen if this option is not marked/checked? - Table comparison will identify first record as update (U) but as this option is not checked table comparison will not know other records are duplicate and it will mark them as Insert considering the new records. So first record will be marked as update and others as Insert
Now let's see what will happen if it is a Insert scenario with duplicate data -
Lets assume your input data is duplicated and data is not in target i.e. New data scenario. In this case there are 2 possibilities -
- If this option is checked - The first incoming record will be identified as INSERT(I) as data is not existing and as this option is checked the table comparison transform will automatically mark the duplicate records as update (U) as it already inserted first record from duplicate records so first record will be Insert and others will be update
- If option is not checked - Now what will happen if this option is not marked/checked? - Table comparison will identify first record as Insert(I) but as this option is not checked table comparison will not know other records are duplicate and it will mark them as Insert considering the new records. so all records will be Insert.
Detect deleted row(s) from comparison table : This option is used when you want to detect deleted rows, it comes with 2 options,
a.
Detect all rows
b.
Detect row with largest generated key value. - To use this option you would need to provide generated key column and column should be a numeric datatype column.
We have come to the end of this blog, I have tried to give you the good explanation on table comparison and map operation with their usage in different scenarios in bods.
I will be covering other data migration, data quality and data anlytics related stuff in my upcoming blogs so stay tuned!!
Please do provide your feedback which will help me in imporving my content and sharing more relevant knowledege on this community
Do follow and subscribe to learn and get insights in the space of Data Migration and Quality and their key topics. Please do share within your connections/networks.
Happy Learning Readers!! 😊