This Blog Post will give the overview on Uniqueness Data Profiling technique within SAP Information Steward data quality tool. I will explain uniqueness data profiling technique step by step in SAP Information Steward.
This article will guide you through step by step procedure and will give you the complete idea on usage of uniqueness profiling techniques.
Now Let's begin with explaining that in detail, I will start with Uniqueness profiling.
Consider the below data set as an example-
Here are some key points to remember when you are performing uniqueness profiling in SAP Information Steward :
Uniqueness profiling is useful for identifying duplicate data within the columns in tables/views.
This profiling feature shows the duplicate as well as non duplicate records with Pie chart graph analysis.
Also, very important It will give you the duplicate count for those non unique records
How to perform the Uniqueness profiling?
To perform the uniqueness profiling on Table/View:
Just select the view/table and hit on Uniqueness profiling from the profiling options in workspace section of SAP Information Steward. Window shown in screenshot will get pop up. (See the screenshot)
Check the boxes against the column names for which you want to check the uniqueness
Hit Save and Run Now button to execute the uniqueness profiling
Uniqueness Profiling Task Details
Important values to keep in mind :
Input Sampling Rate– How you want the records chosen. For example, if you chose a Max input size of 1,000 records and you enter a rate of 1, then the first 1000 records will be profiled. If you enter a rate of 2, then every second record of the total records in the table, up to 1000 records, will be profiled, and so on.
Filter Condition-: You can add filter condition while creating task also using filter condition option.
Important things to remember:
Uniqueness can be defined on individual columns or set of columns. If you will select multiple columns in uniqueness profiling task then it would consider it as a set of columns and analyze the records for uniqueness based on those set of columns.
If you want to perform for individual columns then you need select one column at a time and create multiple instances of the task.
You can check the task status in Task section of SAP Information Steward, once task is complete results can be viewed in workspace section.
Reading the results generated from Uniqueness profiling :
To check the results of uniqueness profiling, you need to follow below steps-
Go to workspace and select the view/table for which you performed uniqueness profiling
On the right hand top corner, click on advanced tab
Under advanced tab section you would see different columns with different profiling technique names
Under uniqueness profiling section, you would see an green tick beside your view, click on that and you should be able to see the results as shown below.
Result -1 - Unique vs Non Unique records
Result - 2 - Duplicate Count for non unique records
Here I complete the detailed explanation of Uniqueness data profiling technique in SAP Information Steward. I will be covering all other types in my next posts, so be connected.
Please do provide your valuable feedback on this post in comments section, this will help me in improving my content and share more knowledge with this community.