cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

How to read an XML file in SAP Data Intelligence?

amish1980_95
Participant
0 Likes
1,784

Hello Team,

I need to read a heavy XML file in SAP Data intelligence v 3.1.

As there are no standard operators to acheive this, what is teh best approach?

Thanks,

Indu K.

Accepted Solutions (0)

Answers (3)

Answers (3)

shaktik
Explorer
0 Likes

Hi Indu ,

I assume you are able to read the file and tried to fetch relevant information form xml file using standard DI operators and it did not meet you needs , I will suggest you to go with custom python logic to retrieve information from xml file .

Thanks
Shakti Kumar

bengt1
Associate
Associate
0 Likes

Hi Indu,

it would be feasible to use a Python Operator in Data Intelligence Pipelines for parsing the XML payload.

With the help of the Python ElementTree XML API library, you can browse through the XML document structure and extract attributes and/or values for all nodes on all levels. Here is the documentation.

The general approach is

  1. Import the library (import xml.etree.ElementTree as ET)
  2. Create an ElementTree object (e.g. xml_tree = ET.fromstring(<payload of the file>)
  3. You can then get values for individual nodes, e.g. with xml_tree.find('.//<name_of_a_node>').text
  4. The entire XML document can be parsed for the relevant node contents using nested loops iterating over arrays of nodes of a specific name. These arrays can be populated in Python, e.g. with: node_name_array = xml_tree.findall('.//<top node name>/<first child note name>'), etc.
  5. Either flatten the entire paylod content (which may generate high volumes of redundant data for deep hierarchical XML structures), or turn it into a relational structure, e.g. with data from the parent nodes getting loaded into header tables and data from their children nodes loaded into item tables together with a unique reference (e.g. GUIDs generated leveraging the uuid library for Python) to their related parent node IDs/keys.

The library documentation linked above is a nice source for guidance covering various different use cases with many example code snippets.

Implementing code in Python Operators may not be the most convenient approach, but the treatment of nested hierarchical XML structures isn't really trivial and can't easily be provided by standard operators.

Kind Regards,

Bengt

michal_majer
Contributor
0 Likes

Hey Indu,

You could try to use Format Converter operator to convert file from XML to CSV and then process.

https://help.sap.com/viewer/97fce0b6d93e490fadec7e7021e9016e/Cloud/en-US/c5b81c724d0b4000afd1afad214...

Best,

Michal

amish1980_95
Participant
0 Likes

Hello Michal,

Thank you for your answer.I have tried this approach using format converter but the issue is with attributes and missing nodes in the deep XML structure. With a structure like below, the value of attribute at third level is captured in json through format converter but is not passed into CSV:

<root>

<child attr = "">

<subchild>.....</subchild>

</child>

<child attr = "">

<subchild>.....

<sub-subchild>

<sub-sub-sub child attr ="">

<element1>mm</element1>

<element2>nn</element2>

</sub-sub-sub child>

</sub-subchild>

</subchild>

</child> </root>

In hierarchical structure of the XML.

My requirement is to parse the xml and load the data in HANA table.

Thanks for your help!

michal_majer
Contributor
0 Likes

Hey,

If You want to load data to HANA, json format is enough.

SAP HANA Client operator allows You to insert data in csv or json format.

Good Luck 🙂
MM

amish1980_95
Participant
0 Likes

Hello MM 🙂

Thans for your answer.

Indeed, with Hana Client, we can insert json file, but the schema of the table in hana client has to be created manually unlike 'table producer' operator which is dynamic.

In my case files are deep, hierarichal and > 44 mb in size.

Please suggest.

Thanks,

IK.