Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
juliana_morais
Product and Topic Expert
Product and Topic Expert
438

What is a Document Information Extraction Schema? 

A schema contains a list of header fields and line items representing the target information that you want to extract from a particular type of document. You must select a schema when you add documents to the Document Information Extraction UI. When uploading documents to the Document Information Extraction service via the Document API, it’s not mandatory to add a schema to the Options Payload, but we strongly recommend this approach since it allows you to extract custom fields from documents along with the standard fields, and also benefit from generative AI. You can find lists of the standard fields in Extracted Header Fields and Extracted Line Items. 

 

Custom Schemas and SAP Schemas 

You can either create your own schema from scratch (for either standard or custom document types) or use a preconfigured SAP schema. The Document Information Extraction service provides SAP schemas for the following standard document types: 

  • Invoice ("schemaName": "SAP_invoice_schema"; "schemaId": "cf8cc8a9-1eee-42d9-9a3e-507a61baac23") 
  • Payment advice ("schemaName": "SAP_paymentAdvice_schema"; "schemaId": "b7fdcfac-7853-42bb-89d2-ede2ba1ce803") 
  • Purchase order ("schemaName": "SAP_purchaseOrder_schema"; "schemaId": "fbab052e-6f9b-4a5f-b42f-29a8162eb1bf") 

In addition, there’s an SAP schema for custom documents (SAP_OCROnly_schema). You can use these SAP schemas unchanged to upload documents. If you don’t want to configure your own schema, simply select the appropriate SAP schema when you upload a document to the Document Information Extraction service. No configuration is needed when you use SAP schemas in this way. In fact, it’s not possible to change or delete original SAP schemas. Alternatively, you can use the Document Information Extraction UI to copy a suitable SAP schema and edit the default fields in line with your needs.  

If you prefer to consume the Document Information Extraction service via API only, you can do the following using the respective endpoints: 

  • Use the relevant schemaId value to see details of the associated SAP schema, including the list of standard fields it contains (Schema API endpoint GET /schemas/{schemaId}). 
  • Create a new custom schema (Schema API endpoint POST /schemas endpoint)  
  • Use the SAP schema details to add standard fields to your new custom schema in line with your needs. Alternatively, add custom fields (Schema API endpoint POST /schemas/{schemaId}/versions/{version}/fields). 

 

Schema Versions 

Finally, when you configure, save, and activate a new schema, Document Information Extraction saves it automatically as version 1. You can use this initial version as the basis for creating additional versions of the same schema. All schema versions share the same name. 

Using versions is helpful if you process documents that have many of the fields provided in the original schema but also include others. You can also use different labels, descriptions, data types, and setup types for fields in different versions. 

 

See also: