SAP Data Warehouse Cloud is a cloud-native data warehouse which specifically targets business users. It does so by focusing on superior ease of use, super-simple data collaboration within an organization, its ecosystem as well as third parties. It lets users seamlessly integrate data from both SAP and non-SAP sources and easily visualize their data with SAP Analytics Cloud.
SAP Data Warehouse Cloud's market place enables new use cases such as joining data internal to your organization with external data in a few clicks, such as joining your supplier inventory with your inventory, comparing your salary offerings with those typical in the job market for comparable roles, comparing own business performance with external economic indicators, enriching one’s business partner data with external ratings or economic performance and many others.
It is fully clear though that the challenge of data integration does not end once the data is brought from SAP Data Marketplace or custom source systems into your workspace.
Joining one’s own data with newly acquired data sources is a challenge in its own right: unless all involved data is clean, any straightforward join between two datasets that belong together “in principle” regularly suffers of some or even many exceptions – be it special characters, typos, duplicates, fuzziness in the data, leading zeros, spelling differences or many more. For these exceptions, decisions need to made by subject matter experts in the business (since they know the data) without the overhead of complex SQL code or long transformation in ETL logic.
Dealing with these exceptions often causes most of the effort and data integration costs.
This is exactly where the Intelligent Lookup operator in Data Warehouse Cloud comes in and provides users with a dedicated data matching environment
Interactive Environment for User Mappings
Users efficiently create mappings & mapping rules in an intuitive no-code UI.
Results are shown immediately. Data is taken as it is.
Mapping Automation through Rules
Rules create mappings automatically.
Users can add or rework mappings where required – down to record level
Mappings need to be created only once
All automated & manual mappings are persisted in a “pairing table”
No further user intervention is required ever again on rows where these mappings are applicable
Full integration with rest of SAP Data Warehouse Cloud
Harmonization result is a standard entity within DWC’s Data Builder that all modelling entities and consuming tools can leverage
Intelligent Lookup demo video in English
This blog by Jan Fetzer inspired me to try using Intelligent Lookup in my language, Japanese, to test its capabilities with a non-latin language.
Why traditional joins fail to solve my issue
My goal is to reproduce a situation which often happens in customer organizations. I want to create a report which lists the sales amountby product and by customer. However, my company operates over 2 separate channels :
In-store sales where customers and inventory are managed through our ERP.
E-Commerce website where customers are managed in a separate database with different ID
The customer and sales data are registered in two different databases : This means that I need to work with 2 disparate datasets which DO NOT share a common key to create my sales report.
The first dataset ERP_Customers holds the customer master data maintained within my central ERP system. It contains a CUSTOMERID field, as well as the name of my customers in kanji, their gender, age and contact information.
The second dataset EC_Customers holds the customer master data from my E-Commerce website.
The EC_ID doesNOT match with the ERP CUSTOMERID field.
I also have the name of my customers in kanji, their gender, age and contact information.
Traditional joins based on e-mail, customer name and other attributes will match some customers. However they will leave thousands of customers which need to be matched manually or cannot be matched at all.
Solving the problem with Intelligent Lookup
Intelligent Lookup solves the problem with an entirely new approach by iteratively building up a match table in the background that keeps track of entries in the input table matches to entries in the lookup table.
Add input & lookup tables
Intelligent Lookup is a standard operator in SAP Data Warehouse Cloud which creates an artefact that can subsequently be used for modelling, similarly to a Graphical or SQL View.
After creating a new Intelligent Lookup, users see a template canvas that they need to fill with life.
Firstly, the user inserts an input dataset and chooses a pairing column : the pairing column is a column of the input dataset which uniquely identifies a record in the lookup dataset.
For any value of the pairing column, there is exactly one object in the lookup entity that matches to that value (though finding which one is not straight-forward)
EC_Customers is the input table. EC_ID is the pairing column
Then the user inserts the lookup table and chooses which columns will be output.
The goal of the Intelligent Lookup is to match all entries in the input table with entries in the lookup table.
I inserted here my ERP_Customers dataset, and choose to add all 11 columns as return columns.
ERP_Customers is the lookup table. 11 columns are returned
Define your first rule
When the input and lookup tables are defined, the user can start adding matching rules : choose the match strategy (exact or fuzzy) and choose the participating columns.
I defined my first matching rule as an Exact match on E-mail. Then, I deployed and ran the operator. In the background, Intelligent Lookup deploys the pairing table, the respective code snippets that do the match logic as well as the output table which joins over input table, pairing table and lookup table.
After deployment, here are the results : 70% matched. The rule could identify 70% of my 5000 E-Commerce customers which are referenced in my ERP by their e-mail.
1499 Unmatched records, after exact matching on E-mail
Adding rule #2 to work through all records that did not pass rule #1
The intelligent lookup lets me run those 30% unmatched records through another match.This time I set up a fuzzy match on name(kanji), name(furigana) and gender.
When setting up a fuzzy match, the user can set the acceptance thresholds within the properties tab.
I set up automatic match when the score is above 95%, and record review when the score is above 88%.
23% of my 5000 E-Commerce customers are automatically matched with a customer in the ERP based on their names and gender.
1156 automatically matched records, after fuzzy matching on kanji, hiragana and gender
By switching to the review tab, you can review the 4% of records which need approval.
I could click on each record and decide whether or not to accept the suggested match.
217 records needing review, after fuzzy matching on kanji, hiragana and gender
Within the 'multiple matches' tab, I can manually match each record which has more than 1 match above the review threshold.
Kaoru Sakakibara is an interesting example, two people in my ERP customer list share that same name. Therefore they both have a match score of 100%, this shows the limit of matching customers based only on their names. Fortunately, they live in different prefectures and have a different phone number. So I could manually match them based on that information.
101 records have multiple matches. Some customers share the same name.
Finally, 25 records could not be matched based on their names and genders.
Add rules to match the remaining records
By taking the unmatched records from the previous step through two exact matches, on fixed phone and mobile phone, I managed to reduce the number of unmatched records to 2. These records lack any usable information to be matched and will need to be excluded from the analysis.
As for the "multiple matches" records, I ran them through another fuzzy match, adding the prefecture, landline phone number and mobile phone number.
Out of 101 records, 66 get automatically matched and 35 still have multiple matches, which need to be manually confirmed.
35 records still have multiple matches, which need to be confirmed manually
Inspecting the output
In a few minutes, I reduced the numbers of records which need to be manually processed from 5.000 to less than 100. We still need to work through some match candidates if we so choose, or we leave it at that. In the output view configuration, we can now decide how we continue from here:
The user can either include or exclude unmatched elements. If you choose to include unmatched records, the default value of their lookup columns is set to NULL. You can also assign them another constant value.
You can optionally add a column that contains True for all unmatched records and False otherwise. This allows later to distinguish between default values and “real” values.
Output of an Intelligent lookup
Building a view based on an Intelligent Lookup
Based on the results of the intelligent lookup ERP_EC_Match, I joined all sales orders & sales orders items from my two channels(In-store sales and E-commerce sales) in the graphical view ERP_EC_GraphView
Join sales orders with sales orders items for ERP.
Join the intelligent lookup to the previous result to add information about customers.
Repeat the same steps for E-Commerce. Join sales orders with sales orders items for EC.
Join the intelligent lookup to the previous result to add information about customers.
Union both data sets
I assigned the semantic type "Analytics dataset" to the view so that it is exposed to analytics.
Consuming results from Intelligent Lookup in SAP Analytics Cloud
I can finally achieve my first goal : show the highest sales by customers, and the highest sales by product through both channels !
Highest sales by customer, for ERP and EC
Highest sales by product, for ERP and EC
Intelligent Lookup is a new way of solving a painful problem in data integration: bringing data together that semantically fits, but doesn’t share a common key.
By allowing users to design matching rules that either automatically match the records or allow users to take a final match decision, Intelligent Lookup strikes a clever balance between automation and end-user-centric decision making.
Of course, when new data is loaded into the transaction tables, all these matches will automatically be reused without any further user intervention. This match will now be used automatically when data comes in, and update my data on SAP Analytics Cloud. This means that we just created a re-usable entity whose value grows over time. Benefit from subject matter experts decisions and don’t ask them again for the same match making.