Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Jerome
Product and Topic Expert
Product and Topic Expert
4,731
Hello RPA fellows !

 

In the previous blog post, I presented how you could integrate the SAP Document Information Extraction service (also called DOX) with SAP Intelligent RPA to extract data from PDF documents. Now that SAP Intelligent RPA 2.0 is officially released, I will show you how to do it again with the low-code approach of the new version of our solution.

 

To set up the service, please read the previous blog post. Once it is done, let's dive into the real topic !

API service key


As a reminder, the service key for the DOX service should have the following structure :
{
"url": "",
"uaa": {
"uaadomain": "",
"tenantmode": "",
"sburl": "",
"clientid": "",
"verificationkey": "",
"apiurl": "",
"xsappname": "",
"identityzone": "",
"identityzoneid": "",
"clientsecret": "",
"tenantid": "",
"url": ""
}
}

Let’s save the urluaa.clientiduaa.clientsecret and uaa.url as we will need them later.

Overview


How to extract data from document ?


First, let's remember the main steps of the process when we extract data from a document using DOX :

  • we must get the access token to be able to use the service

  • we upload the document to the service. The service sends back the document ID, which will be used later

  • Last we try to access the document. If the service has finished processing the document, we get a status DONE. Otherwise, the status is still PENDING and we need to wait a bit before trying again.


How to use a web service call activity ?


The way the activity HTTP call is designed, we need to provide an option object.

To make it easier, let's create this options object. To do so, we need to use a Custom Script activity. The list of input/output of this activity is displayed in the screenshot below :

No input needed. Just an output, where type is Any (equivalent to an object). And in the script editor, we just need to insert the following :



return {
method: 'GET|POST',
url:'',
resolveBodyOnly: true,
headers:{
Authorization:''
}
};

Note: With a POST query, the options might be more complex. Don't hesitate to read the documentation for more details.

Note: To build this options object you might need to use data from previous activities. In that case, feel free to add input parameters.

 

Important note: The attribute resolveBodyOnly allows to directly retrieve the result sent by the service as an object.

When set to false, the HTTP Call activity returns an object which is wrapped into another one. To be more precise, all the data returned by the service are contained in the attribute body of the output of the Call activity Note: the content of this attribute is a stringified object. So to get it, let's create another Custom Script activity, where the input would be the response of the Call activity:


And in the script editor, we insert the following :



let json = JSON.parse(response);
return json;

In that case json would be an object, containing all the data sent by the web service activity. But depending on the case, we can also return something else (such as json.somedata).


But again, this last step is optional when resolveBodyOnly is not set to false.

 

At this point, each time we need to use the HTTP Call activity, we will implement the following structure:


 

Create the automation


OK. Now that we have a better understanding of the way we need to perform calls to web services, we can implement it in our context.







To make it easier, all URLs, credentials and paths are hard-coded. But in real-life you definitely should create a configurable automation with environment variables to ensure the security of all sensitive information.
Path of the file can be set as an input of the automation so you can reuse it.

Generate the authentication token


First, let's generate the authentication token which will be used to call DOX. As explained before, we have the following activities :


The script to generate the token options is detailed below :
return {
method: 'GET',
url:'https://xxxxx/oauth/token?grant_type=client_credentials',
headers:{
Authorization:'Basic xxxxx'
}
};

where xxxxx in the URL is uaa.url mentioned in the first part of this blog post, and xxxxx Authorization is a base64 encoded string composed of uaa.clientid:uaa.clientsecret.

Tip: as we will use the token several times, we can create a string variable and store the token in it. See below :


The value would be :


Upload the document


To upload the document, we will use the same pattern :

  • Generate the options

  • Make the Http call using a POST request

  • Get the document ID which is sent by DOX


To generate the options, we are using the following code :
return {
method: 'POST',
url:'https://aiservices-dox.cfapps.eu10.hana.ondemand.com/document-information-extraction/v1/document/jobs',
headers:{
Authorization: token
},
metadata:[
{
name:'file',
file:'C:/Temp/invoice.pdf',
type:'application/pdf'
},{
name:'options',
value:'{"extraction":{"headerFields":["documentNumber","taxId","taxName","purchaseOrderNumber","shippingAmount","netAmount","grossAmount","currencyCode","receiverContact","documentDate","taxAmount","taxRate","receiverName","receiverAddress","deliveryDate","paymentTerms","deliveryNoteNumber","senderBankAccount","senderAddress","senderName"],"lineItemFields":["description","netAmount","quantity","unitPrice","materialNumber"]},"clientId":"c_00","documentType":"invoice","receivedDate":"2020-02-17","enrichment":{"sender":{"top":5,"type":"businessEntity","subtype":"supplier"},"employee":{"type":"employee"}}}',
type:'text/json'
}
]
};

Note: In the metadata attribute, you need to provide the file path of the document (in this case, as it is a PDF document, we are using the application/pdf type.

Retrieve the data from the document


At this point, the token to use the service is generated, and the upload of the document is made. Now the fun part begins !

We know that the service might take a while to process the document, but we do not know exactly how long. The only solution we have is to periodically ask the service about the status of the processing: if it is PENDING, then we need to wait a few seconds and retry. Else (if it is DONE) that means the service has extracted the data, and we can retrieve them.

 

But... First thing first, let's create a datatype with 2 attributes:

  • a string to store the Status of the processing of the document

  • a complex object name Data (type = Any) to store the result of the processing of the document


As we know it might take a while, let's set the Status to PENDING first.


Then, to implement the wait & retry feature, we need to insert a Forever activity where the condition would be :
if (dtDox.Status !== 'PENDING'){
// break loop
} else {
// wait and retry
}

So we have:


In the activity Generate get options, we have the following code:
return {
method: 'GET',
url:'https://aiservices-dox.cfapps.eu10.hana.ondemand.com/document-information-extraction/v1/document/jobs/' + docId + '?clientId=c_00',
responseType: 'json',
resolveBodyOnly: true,
headers:{
Authorization: token,
'Cache-Control': 'no-cache'
}
};

where docId is the output of the previous paragraph, and token is... well you get the idea !

To get the result, we can store result.status and result.extraction in the according attributes of the instance of the datatype we created before (result being the name of this instance of the datatype).

Now, if the status is DONE, we know that result.extraction will contain the data from the document (see this documentation for more details).

Note: according to the documentation, you will be able to access result.extraction.headerFields and result.extraction.lineItems (and loop over each one of them (they are arrays) to display the name and the value of each extracted fields)

Final result


And voila ! Here is what you should have :


Of course, after the loop you can log the content of the result if you want to.

Conclusion


With some experience, building this automation should not take more than half an hour, which is far less than what was needed with the previous version of SAP Intelligent RPA. But what is important here is that you did not have to write lots of code to complete this automation (only the options for each HTTP call) !

 

Also...


Don't forget to check out the SAP Document Information Extraction documentation as there are some new features since my last blog post (it now supports JPEG and PNG format !). You might be interested in it !

Last, you can find a sample on the Store :


 

Find more information on SAP Intelligent RPA:


Exchange knowledge: SAP Community | Q&A | Blog

Learn more: Webinars | Help Portal | openSAP

Explore: Product Information | Successful Use Cases

Try SAP Intelligent RPA for Free: Trial Version | Pre-built Bots

Follow us on: LinkedInTwitter and YouTube
11 Comments