This blog is a part of “Document Extraction with SAP Intelligent RPA” series.
With the smart document extraction capabilities in SAP Intelligent Robotic Process Automation powered by Document Information Extraction service (an SAP AI Business Hub service), information can be extracted from business documents (such as Invoice, Purchase Order, Payment Advice) using Pre-Trained AI models as seen in the previous blog.
Although, Pre-Trained models extract most of the fields automatically from documents, but not all documents are similar and can contain custom fields or have custom format which is unrecognizable by the models. In this blog, I would like to share with you how you can further enhance your customization by using Document Templates based on your specific business needs.
Please Note: Document Template is a just an additional layer on the top of Pre-Trained AI models. Annotated fields will not change or train the existing pre-trained models. Document Template is confined to a specific project and would not affect extractions in different project/environments.
Prerequisites and Set-Up can be followed from the above mentioned blogs.
Let's look onto a scenario where a company receives numerous invoices from different suppliers. The company maintains a database to store the invoice data and hence wants to automate the invoice extraction. The bot developer is able to use the "Extract Data(Pre-Trained Model)" activity as shown in this blog and extract most of the relevant fields.
One of such Invoice is shown below.
All the fields were extracted successfully except for the "receiver name" which can be seen in the "Extract Data(Pre-Trained Model)" activity output below.
To enhance the extraction, we will simplify and realize this use-case by using the new “Extract Data (Template)” activity along with document template.
Steps to simplify this use-case
Create a new artifact: document template. Document templates can be created from the Intelligent RPA Cloud Studio as shown below.Provide the Template name, description and a sample document. This sample document will be used later to annotate the custom fields.
Select the Document Type.
For this example. we are choosing Invoice as document type.
Choose an existing schema. A schema is a collection of extraction fields which are needed to be extracted from document. There are default schema's for each document type which contains all possible fields that can be extracted from the document.
In this image, a default schema is selected for which the preview of extraction fields is shown.
Click on Add. A new Document Template will be created.
Open the newly created Document Template and click on "Annotate in a new Tab" icon as shown below:
Annotate the receiver name and activate the template as shown below:
Create an automation in the Cloud Studio to use the created template as below:
Drag and drop the "Extract Data (Template)" activity and specify the template and document path.
Test the automation using the test button in Cloud Studio. It will output the correct receiver name as annotated on the sample document.
In the above example, default schema was used to extract the data from documents. The extraction fields can be chosen as required using the "Create New" schema option. As shown below, you can choose from the list of fields which should be extracted from the documents.
If there are already existing templates which other users created in the past, then it can also be used to extract the data from documents. To use an existing template in the project, it has to be imported in the project using the "Choose an existing template" option present on the first page of Document Template creation wizard.
The wizard will show a list of existing template with a preview of sample documents and annotated fields as shown below:
Custom Documents Extraction
Information from documents other than Invoice, Purchase Order or Payment Advice can be extracted by selecting the "Custom" document type in the Document Template creation wizard. Since there are no Pre-Trained models for custom documents, you have to annotate each extraction field.
By reading this blog post, you have learned to enhance the document extraction using Document Templates. Also, you are now familiarized with Schemas, Annotations and Document Types.
If you are not satisfactorily able to extract the information from invoice, you can use template approach to extract the missing/incorrect fields.
Thanks for reading and feel free to leave a comment with questions or feedback 🙂