cancel
Showing results for 
Search instead for 
Did you mean: 

Can I train my own Document Information Extraction model?

keean_ferreira
Explorer
0 Kudos
549

I am currently trying to figure out a way to train my own Document Information Extraction model. I work with pharmaceutical batch records and need to extract the information. Creating a template for them in the standard Document Information Extraction app wouldn't work because they are all quite diverse. Can someone recommend to me the best approach to take?

Within SAP, where can I develop this application that will allow it to access foundational models? Can I train, test, and verify the model within said development environment? Is there a way to leverage generative AI to aid me?

In a OpenSAP course I did, the lecturers mentioned that we'd be able to further train existing Foundational Models to suite our use cases. Is that available yet, and if so, where is it situated?

How do I access the Generative AI hub? Does it form part of the AI Launchpad?

I am new to the SAP space and find it challenging to navigate all the different products and services.

View Entire Topic
tobias_weller
Product and Topic Expert
Product and Topic Expert

Hi Keean,

Thanks for reaching out, last week we released the new premium edition of Document Information Extraction which targets exactly such use cases.

You can read more here: Blog post

Inside this blog post, you can also find tutorials that showcase how you can process your documents with our service and generative AI.

Best regards,

Tobias

keean_ferreira
Explorer
0 Kudos

Thanks for the speedy response Tobias. I did read that article and the description of the app seems to match my use case perfectly, but in all the demonstrations and guides the functionality looks the same as the base version. Also, there is no mention of being able to train the model with your own documents.

I feel like I might be missing the generative AI aspect of the app.

keean_ferreira
Explorer
0 Kudos

To add onto my previous comment:

What service would I need to use to develop applications that leverages the advertise foundational models? Would that be in the AI Launchpad?

tobias_weller
Product and Topic Expert
Product and Topic Expert
0 Kudos

The core change is that you don't need to create any templates or train a model. You just need to create a schema which describes the fields you want to extract and then you can start processing your documents. This functionality is realized through generative AI.

You can experience this in particular in these tutorials here: https://developers.sap.com/tutorials/cp-aibus-dox-ui-gen-ai.html

I recommend to try it with your document type and just create your own dedicated schema for them.

keean_ferreira
Explorer
0 Kudos

Awesome, thanks. Creating the schema would take hours because the documents can be up to 50 pages long. My understanding is that the best approach for me is to create an app within SAP that can leverage and further train existing foundational models for this information extraction. Would I be correct in saying that?

Furthermore, would you be able to guide me as to where within SAP could I do that?

keean_ferreira
Explorer
0 Kudos

One last question from my end:

Looking through the AI offerings at SAP, it seems the Generative AI Management service would be most suiting, as that allows you to build and extend existing foundational models. This service is said to be "Upcoming", but do you have any sort of timeframe for when that might be?

tobias_weller
Product and Topic Expert
Product and Topic Expert
0 Kudos

Indeed with the Generative AI Management, you can leverage various Generative AI models to build your own use cases.

However, for document processing, you would typically want to have additional functionality like extracting the text from the document or have a UI for reviewing the extracted information. This is what is covered in Document Information Extraction.
For the schema you just need to describe what you want to extract, it doesn't depend on the number of pages. E.g. for an invoice you would define the schema as the following:

  • Header fields: InvoiceNo, PurchaseOrderNo, TotalAmount, Currency
  • Line item fields: Description, Quantity, Unit, Amount

With this independent of whether you process a one page invoice or a 100 page invoice, the service would extract the fields listed above from all the line items.