Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
ashishpinjwani
Explorer
0 Kudos
1,124
This Blog Post will give the overview on Redundancy Data Profiling technique within SAP Information Steward data quality tool. I will explain Redundancy data profiling technique step by step in SAP Information Steward.

This article will guide you through step by step procedure and will give you the complete idea on usage of Redundancy profiling technique.

Now Let’s begin with explaining that in detail, I will start with Redundancy profiling.

Consider the below data set as an example-


Data Example


 

Here are some key points to remember when you are performing redundancy profiling in SAP Information Steward :

  • Redundancy profiling helps the user in identifying relationship between the columns of different table/view by giving the details around overlapping of data between those columns.

  • Results of this profiling feature is in the form of venn diagram showing overlap between the columns. Also, gives the count of distinct values for that column and the number can be clicked to view records.


How to perform the Redundancy profiling?

To perform the redundancy profiling on Table/View:

  • Just select the view/table and hit on redundancy profiling from the profiling options in workspace section of SAP Information Steward. Window shown in screenshot will get pop up. (See the screenshot)

  • Select columns from main as well as comparison table for which you want to perform redundancy profiling

  • Hit Save and Run Now button to execute the redundancy profiling


Note : Redundancy profiling can be performed on 2 tables/views and a single column needs to be selected from both the tables for comparison.


Data Redundancy Profile


Important values to keep in mind :

Input Sampling Rate– How you want the records chosen. For example, if you chose a Max input size of 1,000 records and you enter a rate of 1, then the first 1000 records will be profiled. If you enter a rate of 2, then every second record of the total records in the table, up to 1000 records, will be profiled, and so on.

Max Sample Data Size-  This number defines the total number of profiled records which are available to be viewed during result.

Interpreting the results : 

  • You would be able to see the results in the form of venn diagram where it will show the overlap between columns if they have common values else 2 separate circles if the values do not match.

  • You will be able to see the count of records overlapping the values and when you click on the number

    • eg- In screenshot, City column has overlapped values with other city column in different table and when you click on the number you will be able to see those records.






Redundancy Profile Results


 

Here I complete the detailed explanation of Redundancy data profiling technique in SAP Information Steward.

Please do provide your valuable feedback on this post in comments section, this will help me in improving my content and share more knowledge with this community.

Thanks and Happy learning!
Labels in this area