Hello RPA fellows !
In the
previous blog post, I presented how you could integrate the SAP Document Information Extraction service (also called DOX) with SAP Intelligent RPA to extract data from PDF documents. Now that SAP Intelligent RPA 2.0 is officially released, I will show you how to do it again with the low-code approach of the new version of our solution.
To set up the service, please read the
previous blog post. Once it is done, let's dive into the real topic !
API service key
As a reminder, the service key for the DOX service should have the following structure :
{
"url": "",
"uaa": {
"uaadomain": "",
"tenantmode": "",
"sburl": "",
"clientid": "",
"verificationkey": "",
"apiurl": "",
"xsappname": "",
"identityzone": "",
"identityzoneid": "",
"clientsecret": "",
"tenantid": "",
"url": ""
}
}
Let’s save the
url,
uaa.clientid,
uaa.clientsecret and
uaa.url as we will need them later.
Overview
How to extract data from document ?
First, let's remember the main steps of the process when we extract data from a document using DOX :
- we must get the access token to be able to use the service
- we upload the document to the service. The service sends back the document ID, which will be used later
- Last we try to access the document. If the service has finished processing the document, we get a status DONE. Otherwise, the status is still PENDING and we need to wait a bit before trying again.
How to use a web service call activity ?
The way the activity
HTTP call is designed, we need to provide an option object.
To make it easier, let's create this options object. To do so, we need to use a
Custom Script activity. The list of input/output of this activity is displayed in the screenshot below :
No input needed. Just an output, where type is Any (equivalent to an object). And in the script editor, we just need to insert the following :
return {
method: 'GET|POST',
url:'',
resolveBodyOnly: true,
headers:{
Authorization:''
}
};
Note: With a POST query, the options might be more complex. Don't hesitate to read the documentation for more details.
Note: To build this
options object you might need to use data from previous activities. In that case, feel free to add input parameters.
Important note: The attribute resolveBodyOnly allows to directly retrieve the result sent by the service as an object.
When set to
false, the HTTP Call activity returns an object which is wrapped into another one. To be more precise, all the data returned by the service are contained in the attribute
body of the output of the Call activity
Note: the content of this attribute is a
stringified object. So to get it, let's create another Custom Script activity, where the input would be the response of the Call activity:
And in the script editor, we insert the following :
let json = JSON.parse(response);
return json;
In that case json would be an object, containing all the data sent by the web service activity. But depending on the case, we can also return something else (such as json.somedata).
But again, this last step is optional when
resolveBodyOnly is
not set to
false.
At this point, each time we need to use the HTTP Call activity, we will implement the following structure:
Create the automation
OK. Now that we have a better understanding of the way we need to perform calls to web services, we can implement it in our context.
|
To make it easier, all URLs, credentials and paths are hard-coded. But in real-life you definitely should create a configurable automation with environment variables to ensure the security of all sensitive information.
Path of the file can be set as an input of the automation so you can reuse it. |
Generate the authentication token
First, let's generate the authentication token which will be used to call DOX. As explained before, we have the following activities :
The script to generate the token options is detailed below :
return {
method: 'GET',
url:'https://xxxxx/oauth/token?grant_type=client_credentials',
headers:{
Authorization:'Basic xxxxx'
}
};
where
xxxxx in the URL is
uaa.url mentioned in the first part of this blog post, and
xxxxx Authorization is a base64 encoded string composed of
uaa.clientid:uaa.clientsecret.
Tip: as we will use the token several times, we can create a string variable and store the token in it. See below :
The value would be :
Upload the document
To upload the document, we will use the same pattern :
- Generate the options
- Make the Http call using a POST request
- Get the document ID which is sent by DOX
To generate the options, we are using the following code :
return {
method: 'POST',
url:'https://aiservices-dox.cfapps.eu10.hana.ondemand.com/document-information-extraction/v1/document/jobs',
headers:{
Authorization: token
},
metadata:[
{
name:'file',
file:'C:/Temp/invoice.pdf',
type:'application/pdf'
},{
name:'options',
value:'{"extraction":{"headerFields":["documentNumber","taxId","taxName","purchaseOrderNumber","shippingAmount","netAmount","grossAmount","currencyCode","receiverContact","documentDate","taxAmount","taxRate","receiverName","receiverAddress","deliveryDate","paymentTerms","deliveryNoteNumber","senderBankAccount","senderAddress","senderName"],"lineItemFields":["description","netAmount","quantity","unitPrice","materialNumber"]},"clientId":"c_00","documentType":"invoice","receivedDate":"2020-02-17","enrichment":{"sender":{"top":5,"type":"businessEntity","subtype":"supplier"},"employee":{"type":"employee"}}}',
type:'text/json'
}
]
};
Note: In the metadata attribute, you need to provide the file path of the document (in this case, as it is a PDF document, we are using the
application/pdf type.
Retrieve the data from the document
At this point, the token to use the service is generated, and the upload of the document is made. Now the fun part begins !
We know that the service might take a while to process the document, but we do not know exactly how long. The only solution we have is to periodically ask the service about the status of the processing: if it is PENDING, then we need to wait a few seconds and retry. Else (if it is DONE) that means the service has extracted the data, and we can retrieve them.
But... First thing first, let's create a datatype with 2 attributes:
- a string to store the Status of the processing of the document
- a complex object name Data (type = Any) to store the result of the processing of the document
As we know it might take a while, let's set the Status to PENDING first.
Then, to implement the wait & retry feature, we need to insert a
Forever activity where the condition would be :
if (dtDox.Status !== 'PENDING'){
// break loop
} else {
// wait and retry
}
So we have:
In the activity
Generate get options, we have the following code:
return {
method: 'GET',
url:'https://aiservices-dox.cfapps.eu10.hana.ondemand.com/document-information-extraction/v1/document/jobs/' + docId + '?clientId=c_00',
responseType: 'json',
resolveBodyOnly: true,
headers:{
Authorization: token,
'Cache-Control': 'no-cache'
}
};
where
docId is the output of the previous paragraph, and
token is... well you get the idea !
To get the result, we can store
result.status and
result.extraction in the according attributes of the instance of the datatype we created before (
result being the name of this instance of the datatype).
Now, if the status is DONE, we know that
result.extraction will contain the data from the document (see
this documentation for more details).
Note: according to the documentation, you will be able to access
result.extraction.headerFields and
result.extraction.lineItems (and loop over each one of them (they are arrays) to display the name and the value of each extracted fields)
Final result
And voila ! Here is what you should have :
Of course, after the loop you can log the content of the result if you want to.
Conclusion
With some experience, building this automation should not take more than half an hour, which is far less than what was needed with the previous version of SAP Intelligent RPA. But what is important here is that you did not have to write lots of code to complete this automation (only the options for each HTTP call) !
Also...
Don't forget to check out the
SAP Document Information Extraction documentation as there are some new features since my last blog post (it now supports JPEG and PNG format !). You might be interested in it !
Last, you can find
a sample on the Store :
Find more information on SAP Intelligent RPA:
Exchange knowledge:
SAP Community |
Q&A |
Blog
Learn more:
Webinars |
Help Portal |
openSAP
Explore:
Product Information |
Successful Use Cases
Try SAP Intelligent RPA for Free:
Trial Version |
Pre-built Bots
Follow us on:
LinkedIn,
Twitter and
YouTube