Today we will try to understand the process of reading data from scanned or digital documents using the below activities provided by SAP as part of irpa_pdf sdk library 1.15.83.
Extract Data Without Template : Input for this activity is Document Type, Document Path of PDF, Output is extracted data using standard schema of that particular document type.
Extract Data With Template: Input for this activity is Document Template, Document Path, Output is extracted data using either standard schema or custom schema.
Prerequisites to understand before using above activities:
currently SAP only supports 3 Document types: Invoice, Purchase Order, Payment Advice.
For each document type sap has provided schemas which can not be editable eg: For document type Invoice, schema is SAP_invoice_schema.(Schema is the list of fields (header, Item) used to identify the required information from corresponding document like invoice number, Total, subtotal, Tax ..)
By copying the standard schema we can add or delete the required fields from the schema and activate it..
Steps to design automation with Extract Data with Template:
We have to use this approach when the template complexity is high ,AI & ML models not able to determine the fields from the schema, By using the annotations functionality while creating the template we are training our invoices(we can upload max 5 sample invoices for annotating) ,hence next time same vendor invoice comes it will able to extract the data using this templates making the accuracy to 100 percent.
How to create Template?
After creating automation project just select the artifact create template
Provide the meaning full Name , description of template , any document type as per your requirement, select the schema either standard or custom here i am using standard template and provide the document path and click on create
After this open the document in Document Information Extraction editor for annotation, like invoice number, PO number, total, subtotal. Next save and activate the Template for consuming this template in automation.
In automation pass the template name as vendor1 and path of the invoice with different data of same vendor.
Now the bot is able to understand this template and able to retrieve the required data, same has been printed in console.
Conclusion: For invoices which we are not able to get required field information using the activity Extract Data Without Template we have to use the activity Extract Data with Template using above steps.
Thanks for reading and please provide your comments and questions.