Application Development Blog Posts
Learn and share on deeper, cross technology development topics such as integration and connectivity, automation, cloud extensibility, developing at scale, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 
paly_o
Explorer
11,334
In first part of this blog I give introduction to OpenXML in word processing. In second part I will provide ABAP code how to read word files.

Starting with Microsoft Word 2007 when you create new document in word and save it - a new file is created with extension  "*.docx". This file represents zipped xml files which describe whole word document. It includes, texts, tables, font sizes, colors, comments, margin settings, sections settings and everything what user manually placed and maintained in document. It is all about xml files bounded via relations one with each other in specific structure and zipped into file.

To explore this structure create your test document with something in it and save it. Rewrite extension "*.docx" into "*.zip" and unzip file. After unzpipping you see all xml files in specified structure. If you need to check and have a look at these xml files often I reccomend more convenient way. I suggest to install OOXML Tool which is add-on for Chrome browser. In easy drag and drop way you can see whole word document.

For example I created Test.docx with text "Hello World". Note that until you provide any input in word it has size of 0. I drag word file into chrome using above mentioned add-on to see xml structure of word docuemnt. I look for /word/document.xml to see text tag which holds value "Hello world".



Each xml file describes properties for document parts or relation between parts. For example:

  • Conten_types xml describes type of content used in each part of whole document(package)

  • _rels part describes relation between two parts

  • doc properties part describe general properties of document in app and core xml file (application, author, version...)

  • custom xml part is part which can hold customer specific data - this will be more described in other blog

  • content of document is in /word/document.xml file

  • fontTable xml contains information about used font types

  • styles xml describes used styles


SAP provides class CL_DOCX_DOCUMENT which can help us to read and modify word document and go through its structure. Here is simple code which does the job..
*&---------------------------------------------------------------------*
*& Report ZDOCX_DOCUMENT
*&
*&---------------------------------------------------------------------*
*& Report demonstrates using CL_DOCX_DOCUMENT class to read and maintain
*& word document.
*& Pavol Olejar 23.4.2017
*&---------------------------------------------------------------------*
REPORT zdocx_document.

DATA: lv_length TYPE i,
lt_data_tab TYPE STANDARD TABLE OF x255,
lv_docx TYPE xstring,
lv_string TYPE string,
lv_xml TYPE xstring,
lr_docx TYPE REF TO cl_docx_document,
lr_main TYPE REF TO cl_docx_maindocumentpart.
* Upload file
CALL METHOD cl_gui_frontend_services=>gui_upload
EXPORTING
filename = 'C:\Test.docx'
filetype = 'BIN'
IMPORTING
filelength = lv_length
CHANGING
data_tab = lt_data_tab.
* Get XSTRING format from BIN table
CALL FUNCTION 'SCMS_BINARY_TO_XSTRING'
EXPORTING
input_length = lv_length
IMPORTING
buffer = lv_docx
TABLES
binary_tab = lt_data_tab.
* Instanciate word document in ABAP class CL_DOCX_DOCUMENT
CALL METHOD cl_docx_document=>load_document
EXPORTING
iv_data = lv_docx
RECEIVING
rr_doc = lr_docx.
* Get main part where content of word document is stored
lr_main = lr_docx->get_maindocumentpart( ).
* Get data (XSTRING) of main part
lv_xml = lr_main->get_data( ).
* Convert to string for simple maintaining
CALL FUNCTION 'CRM_IC_XML_XSTRING2STRING'
EXPORTING
inxstring = lv_xml
IMPORTING
outstring = lv_string.
* Change text
REPLACE FIRST OCCURRENCE OF 'Hello world.' IN lv_string
WITH 'Hello world. This is my Test_new.docx document.'.
* Convert back to XTSRING
CALL FUNCTION 'SCMS_STRING_TO_XSTRING'
EXPORTING
text = lv_string
IMPORTING
buffer = lv_xml.
* Replace main part with new data and save it
lr_main->feed_data( iv_data = lv_xml ).
lv_docx = lr_docx->get_package_data( ).
* Save new word document locally
lv_length = xstrlen( lv_docx ).

CALL FUNCTION 'SCMS_XSTRING_TO_BINARY'
EXPORTING
buffer = lv_docx
TABLES
binary_tab = lt_data_tab.

CALL METHOD cl_gui_frontend_services=>gui_download
EXPORTING
bin_filesize = lv_length
filename = 'C:\Test_new.docx'
filetype = 'BIN'
confirm_overwrite = 'X'
CHANGING
data_tab = lt_data_tab.

Methods get*part of class can provide different parts of document. Inhere we were interested in main part.

Method get_data( ) will give you back xml file from the part and using method feed_data( ) you store xml in used part of the document. These methods are part of every class which represents different parts of documents. For example In our case it is CL_DOCX_MAINDOCUMENTPART. See in debugger

Method get_package_data( ) of class CL_DOCX_DOCUMENT will save all current parts and pack them into zip file.

You can check that in debugger when looking at variables lv_xml and lv_docx using view XML browser. For variable lv_xml you see xml file of main part.



For lv_docx you are prompt with pop-up if you want to save zip.file which is result of get_package_data( ) method.



In my next blog I will describe custom part of word document and how ABAP developer can use it.
17 Comments
Labels in this area