
Introduction to Document Information Extraction
Document Information Extraction (a.k.a DOX) comes to rescue when we need to process large amounts of business documents that have content in headers, line items and tables. Extracted information can be further used, for example, to automatically process payables, invoices, or payment notes while making sure that invoices and payables match. After uploading a document file to the service, it returns the extraction results from header fields and line items. Leverage the power of Generative AI to automate the information extraction with the DOX Premium edition.
Plan offerings for DOX
Base plan: Base Edition service plan includes all core features but doesn't include document information extraction using generative AI.
Premium: Premium edition service plan that includes document information extraction using Generative AI.
Note: There is Free and trial offering as well. For this blog, we will be using a trial plan.
For Metering and Licensing, refer this
Supported Document types and formats
Document Information Extraction supports the following document file formats as input:
For information related to supported languages and regions, refer this.
What are the DOX benefits?
With Document Information Extraction we can:
My use case scenario
The requirement is straightforward, we want to extract information from SAP standard documents (Invoice, Purchase, Sales Orders etc.) or even custom documents like taxi expense receipts for further processing.
Using the Document Information Extraction UI is much easier than API based usage, hence the objective of this blog is to provide an overview of DOX APIs and how to leverage them in custom applications build to enable the OCR capabilities.
DOX architecture(use case)
If you wish to explore Document Information Extraction UI then follow this tutorial
Prerequisites
1) SAP Document Information Extraction (Trial account): Refer this tutorial for spinning off free DOX trial account
2) SAP BTP trial account: Refer this tutorial for setting up a free BTP trial account for UI5 application development
In order to explore and consume DOX APIs, first, an service key needs to be created, to authenticate and authorize the incoming requests. Please refer this tutorial to generate a service key from instance
Let's walkthrough and understand the DOX API's first via an HTTP tool(Postman) and then we jump into BTP UI5 application development part. Using the DOX APIs, we will be able to extract documents information in just 3 steps.
1. Get OAuth access token
Below are the mandatory parameters which needs to be entered to generate an access token successfully.
Operation | GET |
URL | https://{{uaa url from the DOX service key}}/oauth/token?grant_type=client_credentials |
Query Params | grant_type=client_credentials |
Authorization | Username: clientid from the DOX service key Password: clientsecret from the DOX service key |
Access token call
Note: Access token validity is for 12hours
2. Upload/Post document call
Let's take a simple Invoice document pdf with few reference fields we need to extract.
Sample Invoice Document
HeaderFields: purchaseOrderNumber,netAmount,senderAddress,currencyCode
LineItemFields: description,netAmount,quantity,unitPrice,materialNumber,unitOfMeasure
Operation | POST |
URL | https://{{Backend endpoint url from the DOX service key}}/document-information-extraction/v1/document/jobs |
Form-Data | file= {{Upload document file with supported extension}} |
Form-Data | options= { "extraction": { "headerFields": [ "purchaseOrderNumber", "netAmount", "senderAddress", "currencyCode" ], "lineItemFields": [ "description", "netAmount", "quantity", "unitPrice", "materialNumber", "unitOfMeasure" ] }, "clientId": "default" } |
Authorization | Bearer {{Access Token generated from the first call}} |
Upload document call
Under options parameter, we need to key in a JSON object which contains reference header and line item fields we need to extract from the document, enrichment parameters (if any), clientId (The ID of the client, this is a mandatory field. For trial account it is default but we can create our own client using Client API)
Note: Detailed optional parameters are well explained in the API reference guide
Upon successful operation of upload document call, the status of the job will be in PENDING status which means the document is uploaded successfully and currently being processed, we can store the request Id which will be further used to get the DOX extraction results.
3. Get Extraction results
Extraction results can be retrieved by calling the jobs API with the request id retrieved after the upload call. Once the document processing is successful the status for extracted document will be in "DONE" status.
Operation | GET |
URL | https://{{Backend endpoint url from the DOX service key}}/document-information-extraction/v1/document/jobs/{{document id from the POST call response}} |
Authorization | Bearer {{Access Token generated from the first call}} |
Extract document results call
Header section extracted fields
In the extract results response, header and line items fields will be separated and there is also an extraction confidence level against each field. (This can be used to determine the accuracy of extraction and if needed a manual check step can also be performed in case of automation scenarios)
Now, we have explored and understand the DOX APIs functionality via Postman, it's time to infuse the DOX APIs in BTP UI5 application.
Infuse DOX APIs in SAP BTP UI5 Application
For the sharing purpose, we have build one lightweight UI5 application via Business Application Studio and deployed it on SAP BTP. It takes the input from user to upload the document(Invoice in our case) as an attachment and once the document is uploaded, backend code will infuse the DOX APIs via BTP Destinations to perform the OCR operation.
BTP Destinations: used to store and retrieve any technical information which is required to connect to a remote service(DOX APIs in our case) from the application deployed in BTP. 1st destination is to get the access token and other destination to call DOX APIs.
Destination for token
Destination for DOX APIs
Once the destinations are maintained, we can proceed with the Business Application Studio to start building the UI5 application. Below is the code snippet which takes the upload file as input, push it to DOX APIs for the document upload call and then finally extracts the results out of the completed document job.
onUpload: function () {
var that = this;
var oFileUpload = this.getView().byId("fileUploaderScanner");
var oUploadedFile = oFileUpload.oFileUpload.files[0];
const blob = new Blob([oUploadedFile], { type: oUploadedFile.type });
var oOptions = {
extraction: {
headerFields: ["purchaseOrderNumber", "netAmount", "senderAddress", "currencyCode"],
lineItemFields: [
"description",
"netAmount",
"quantity",
"unitPrice",
"materialNumber",
"unitOfMeasure"
]
},
clientId: "default"
};
var oFormData = new FormData();
oFormData.append("options", JSON.stringify(oOptions));
oFormData.append("file", blob, oUploadedFile.name);
var oScanModel = new JSONModel();
oScanModel.loadData("oauth/token", "grant_type=client_credentials", {
"Content-Type": "application/json"
});
oScanModel.attachRequestCompleted(
function (oData) {
var sAccessToken = oData.getSource().getProperty("/access_token");
var oHeaders = {
Accept: "application/json",
"X-Requested-With": "XMLHttpRequest",
Authorization: "Bearer " + sAccessToken
};
const requestOptions = {
method: "POST",
headers: oHeaders,
body: oFormData
};
fetch("document-information-extraction/v1/document/jobs", requestOptions)
.then((response) => response.text())
.then((result) => {
var oResult = JSON.parse(result);
var sJobId = oResult.id;
fetch("document-information-extraction/v1/document/jobs/" + sJobId, {
headers: {
Authorization: "Bearer " + sAccessToken
}
})
.then((response) => response.json())
.then((result) => {
var oNewClaimModel = new JSONModel({
PONumber: result.extraction.headerFields.find(
(x) => x.name === "purchaseOrderNumber"
).value,
Vendor: result.extraction.headerFields.find(
(x) => x.name === "senderAddress"
).value,
Amount: result.extraction.headerFields.find(
(x) => x.name === "netAmount"
).rawValue,
Receipt: result.fileName
});
that.getView().setModel(oNewClaimModel);
})
.catch((error) => console.error(error));
})
.catch((error) => console.error(error));
}.bind(this)
);
},
Final outcome: Once the UI5 application is deployed, below is the user input screen which takes the document as input and once it is submitted, the backend code will call the DOX APIs in below sequence.
1) Access token will be fetched
2) Submitted document will be uploaded in the /documents DOX API,
3) Once the document is processed by DOX(probably within few seconds) then extractions results can be fetched via Job Id (as explained in Postman testing section)
4) After getting the extraction results, relevant fields can be mapped to appear in the application UI
Invoice Submission UI
Upload document UI
Extraction results UI
With this blog, we have explored the DOX capability(powered by Generative AI) and how to infuse the DOX APIs for any custom scenarios where OCR is required.
Happy Extracting!! 🤖
Important links to consider:
1) API reference with common error codes and status: Link
2) Limitations and technical constraints: Link
3) SAP help for DOX: Link
Disclaimer: Above post is purely based on the personal learning and to showcase the basics of how we can leverage DOX APIs for the OCR requirements. Feel free to explore more and improvise your developments as per your particular use cases. Happy to hear comments and feedback.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
8 | |
5 | |
5 | |
4 | |
3 | |
3 | |
3 | |
3 | |
3 | |
2 |