Introduction
Data must be complete and consistent if it is to provide accurate analytics. If inconsistencies existed in the data, it has always been a tough process to identify the exact erroneous records and to take action.
This new data validation framework provides initial validation rules to check the consistency of your data with the modelling metadata like the view key columns and hierarchy nodes and provides detailed output about the erroneous records in case of inconsistencies.
The goal is subsequently enrich the data validation framework with more validation rules. In the outlook section below, there is link to the roadmap items on the new rules that will get added to this framework. Please add comments on this blog post about other validations that you think might be beneficial for you.
Two Types of Validation Rules have been released.
Key Validation Rules
The following rules are available in the Keys category:
Rule | Description |
Unique Key Values | Each record must have unique primary key values. For example, if an object has primary key columns ID and Language and a record has primary key values of 1 and EN, then no other record may have this combination of values. |
No Null Key Values | No record may have null primary key values. For example, if an object has primary key columns ID and Language no record may have null values in these columns. |
Hierarchy Validation Rules
The following rules are available in the Hierarchies category:
Rule | Description |
No Multiple Parents | No hierarchy node may have multiple parents. For example, if a hierarchy node A has a parent node B, then it may not have any other parent. |
No Circular Hierarchies | No hierarchy node may have circular parent-child relationships. For example, if a hierarchy node A has a parent node B, then it may not also have node B as a direct child or other descendant. |
Running a Data Validation
Key validations are relevant in the realm of view modelling because there they are important during modelling (e.g. when creating associations), but unlike with tables the database does not check that the view data indeed have key properties.
It has always been a difficult process to identify what records cause the inconsistencies in the model. Additionally, not every modeller might have full access to the data. With this new approach CopySQL/ Copy SQLScript introduced as part of the data validation framework, any modeller can see the results of the validation even though they might not be allowed to see all the details of the induvidual rows.
Even users with limited restrictions are allowed to trigger the validations.
The Data Validation Status button in the view properties panel shows the status of the last data validation, and the Validated On field shows the date and time of the last run.
To validate data (or review the results of a previous validation):
2. If the panel is empty (or if you want to re-run the validation), click Validate as indicated above.
The time required for data validation depends on the number of records and the complexity of calculations required. You will receive a notification when validation is complete.
3. Review the validation results. There is one result for each rule as shown below,
Reviewing Rules with Errors:
2. In the Details dialog, click Copy SQL (or Copy SQLScript) to get the code necessary to generate the full list of records that are not consistent with the rule
Viewing Erroneous Records - 'Copy SQL'
Click on 'Copy SQL' and then Create a new SQL view, paste the code into it, and preview the data to see these failing records. Examples below,
a. Unique Key Values - Doing a data preview on the Copied SQL in the New SQL view, shows the records that violate the Unique Key Constraints along with the number of their occurrences.
b. No Null Values - Doing a data preview on the Copied SQL in the SQL view, shows all the 9 records where NULL values are present in the primary key column 'LeafNode'
Viewing Erroneous Records - 'Copy SQLScript '
Click on 'Copy SQLScript' and then Create a new SQL view, paste the code into it. Choose the SQLScript (TableFunction) in the view properties and add the columns as well as shown below. Then save and deploy the view and then click on Data Preview.
a. No Multiple Parents - Doing a data preview on the Copied and deployed SQLScript(Table Function) type SQL View, shows the records that have multiple parents.
You can see that the Tops Child has two Parents Men's Wear and Women's wear which is indicated in the above output.
b. No Circular Hierarchies - Doing a data preview on the Copied and deployed SQLScript(Table Function) type SQL View, shows the records that have a circular hierarchy relationship
You can see below the Circular Hierarchy relationship in the dataset as indicated in the above output
Current Limitations
Currently, some validation rules may be unavailable if:
However, these limitations will be eliminated in the future with updates to the validation framework.
Outlook
As mentioned above, we will be adding more rules/validations to this validation framework. For more details about additional validations coming in Q4 2024, refer to the roadmap item - Additional model validations
Summary
With the data validation framework, you should now be able to validate your metadata & data and correct errors easily based on the information available on the erroneous records.
I would like to thank @MohanambalDuraisamy for her collaboration in this blog.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
26 | |
14 | |
12 | |
11 | |
9 | |
9 | |
6 | |
6 | |
5 | |
5 |