Solved: JSON Normalize fails for Nested Json array

rajeshps · ‎2023 May 18

Hello Team,

For below nested json, the array is not getting normalized using

import pandas as pd
import json

def on_input(data):

df = pd.read_json(data)df = pd.json_normalize(json.loads(data))

api.send("output", df.to_json(orient="records"))

api.set_port_callback("input1", on_input)

I also tried max_level and then record_path but no luck.

Flow: Kafka producer(json string) -> avro decoder -> Python3 -> Hana Client

Is there any way to normalise the json to check iF Then else condition & then updated suffix for column field and eventually updated to DB with no duplicates. Here header.poNumber and data.id are primary keys/unique identifiers.

Example:

 [
   {
      "header.poNumber":"9023496",
      "data.id":"10013459",
      "message.source":[
         {
            "createSource":null,
            "timeStamp":"2023-05-12T19:30:00.0000000+02:00",
            "type":"full"
         },
         {
            "createSource":"testdev",
            "timeStamp":"2023-05-11T19:30:00.0000000+02:00",
            "type":"ordersEstimated"
         },
         {
            "createSource": "event",
            "timeStamp":"2023-05-12T12:30:00.0000000+01:00",
            "type":"ordersCreated"
         }
      ],
      "message.time":[
         {
            "timeSource":"UTC",
            "typeId":"full"
         },
         {
            "timeSource":"IST",
            "typeId":"actual"
         }
      ]
   }
]

Expected output:

Vitaliy-R · ‎2023 May 18

I do not think you can simply normalize this record because it contains two arrays of dictionaries: `"message.time"` and `"message.source"`.

From the "expected output", I understand that you want to join values from `"message.time"` to values from `"message.source"` on the `type` attribute to create records.

So, I think you need to flatten two arrays into separate Pandas DataFrames, and then merge them on keys. Something like:

data_as_json=json.loads(data)<br>df_source=pd.json_normalize(data_as_json, record_path='message.source', meta=['header.poNumber', 'data.id'])
df_time=pd.json_normalize(data_as_json, record_path='message.time', meta=['header.poNumber', 'data.id'])
df=df_source.merge(df_time, left_on=['header.poNumber','data.id','type'], right_on=['header.poNumber','data.id','typeId'])

Here are my tests:

Regards,
-Vitaliy

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My SAP Profile

My SAP Profile

JSON Normalize fails for Nested Json array

Know the answer?

Need more details?