Introduction
In this post, I would like to share how I worked with a nested JSON input and distributed it to two SAP HANA tables for further processing. This was a little bit tricky because I had to combine the unnesting of hierarchical data and the creation of a primary key. The key should also be used as a foreign key in the same flow. So let's get right into it!
Requirements & Data
In this section, I will introduce the structure of incoming data and the tables which will be the result of my Data Flow.
The input file
- nested JSON with a list of ChangeSets
- every object contains a list of Changes
[
{
"ChangeSetName": "A Bunch of Changes",
"Changes": [
{
"Name": "First Change"
},
{
"Name": "Second Change"
}
]
},
{
"ChangeSetName": "Some more Changes",
"Changes": [
{
"Name": "First Change"
},
{
"Name": "Second Change"
}
]
}
]
The use-case
- fill a table tracking the ChangeSets
- create a new unique ChangeSetId for every ChangeSet
- fill a table of Changes
- add the ChangeSetId to every Change
Tables to be created
ChangeSetId |
ChangeSetName |
A |
A Bunch of Changes |
B |
Some more Changes |
ChangeId |
ChangeSetId |
Name |
1 |
A |
First Change in Set A |
2 |
A |
Second Change in Set A |
3 |
B |
First Change in Set B |
4 |
B |
Second Change in Set B |
SAP Data Services
The Data Flow serves multiple purposes:
- reading the input file
- unnest the arrays
- fill a table of ChangeSets with a unique ChangeSetId
- fill a table of Changes with a corresponding ChangeSetId attached as a foreign key
Result: Through the ChangeSetId every
Change is linked to the
ChangeSet it came from.
The main Data Flow
You can see that this flow takes the input file and in the end, two tables are filled with data. The most interesting steps (Query Transforms) are marked and explained below.
1.) Creation of the ChangeSet
(Please right-click this image and select "Open in a new tab" to see details.)
In this transform there are two main steps:
- Creating a unique ChangeSetId
- This is done by calling a Database Sequence
sql('<DATA STORE>', 'select "<SCHEMA>"."<NAMESPACE>::<SEQUENCE>".NEXTVAL from dummy')
Code to call the Database Sequence for an SAP HANA Database.
2.) Unnesting
(Please right-click this image and select "Open image in a new tab" to see details.)
In this transform, a ChangeSetId is added to every Change (1). Also, the data nested in the arrays of the ChangeSets is unnested (flattened, unpacked however you would like to call it). This is needed to create a new database row for every Change.
This is done by right-clicking the nested data on the right and selecting "Unnest with sub-schemas". (2)
Important learning
Please note that the unique
ChangeSetId is already present before the unnesting. This is very important. If the ChangeSetId is added at a later point in the flow, a join is performed and data is written to the database multiple times which is undesirable.
My learning here is that the SQL command to create the unique key can be performed in ANY transform. I made the false assumption that it had to be right before the Database insert.
In the working flow above, the ChangeSetId is generated in the beginning and then passed through the flow.
Conclusion
In this post, I showed how to work with a nested JSON file in SAP Data Services and to transfer its contents to a database structure with Primary and Foreign Key relationship.
Please let me know if you have any questions!
Happy holidays.