Starting with 2022, SAF-T reporting to Romanian fiscal authorities is mandatory, more information
here. Reporting has to be generated at document level in XML files and that creates a problem of working with files with sizes of hundred of megabytes.
As you know XML files are very inflated because it store the structure and format for each value. Retrieving applications such as
Notepad ++ or
XML Notepad for XML format (nodes) are designed to upload the entire file in memory. In case of files over 100 MB you get a freeze or out of memory crash. XML format gives you all details, but without calculations as in tables. You have to move to flattening to solve the problem of size and get table format for calculations. However with flattening applications I experienced same problem - out of memory crash. I have tried to go to the root - the programming language and tested some Python libraries, same problem - out of memory crash. I think worse trying these solutions: one with Python from Jeff Heaton,
Processing Large XML Wikipedia Dumps that won't fit in RAM in Python, jupyter notebook code
here and another one streaming solution from Microsoft
here.
Since a lot of users work with Microsoft Office I explored the capabilities of
Excel Power Query and I can confirm that it handles transformation of files up to 100 MB, a beginner tutorial
here.
Flattening a file with a size above 100 MB becomes slower or is freezing in dialog steps
Connecting, Navigator and Table Expansion (flattening). One solution is to split XML files into chunks of 100 MB and then merge results. This means more work, however solves the problem.
I found an workaround to flatten XML files up to 400 MB that outputs about of 400 thousand rows using Advanced Editor script
Power Query M formula language.
This is what I want to share with you.
- Generate a small.XML file with small amount of data, but complete structure.
- Menu: Data/Get data/From File/From XML.
- Create transformation rules by expanding tables.
- Copy the transformation code from Advanced editor script and save it into plain text Notepad.
- You are ready to flatten the big.XML.
- Next time repeat [2].
- Open Advanced and paste the code from [4] and replace small.XML with big.XML.
- Press Close & Load.
- Running background query starts.
- Hundreds of thousands of rows are filled with flattened data.

Data / Get data / From File / From XML

Navigator / Transformation / Power Query Editor

Power Query Editor / Tables expanding

Advanced editor script / Copy small.XML to Notepad and save it for next use

Advanced editor script / Replace small.XML with big.XML

Running background query

Rows loaded - 324.365

File sizes
What is the biggest XML file you succeed to flatten with Excel Power Query?
What are the other tools and tips you used to flatten XML with MS Office?
Enjoy!