cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

How do I make DOX learn?

TMNielsen
Contributor
0 Kudos
1,966

Hi 

I trying to load some purchase orders with different layouts. I Need to extract at least one field that is not included in the SAP standard schema (EAN no.), so I made a copy of the SAP_purchaseOrder_schema and added the ean header field to the copy.

The new field can have setup type Manual or Auto (with a default extractor). As I understand the documentation, a custom field must have setup type Manual. Also I understand that only setup type Auto will use the AI functionality and if I chose Manual setup type, I must make a template. Does this mean that fields with Auto are handled by the AI and fields with Manual are depending on the template?

So I have made the new field with setup type Manual and I have made a template.

Then I uploaded a file. Some data was extracted wrongly so I entered Edit mode and corrected the errors and saved the document again. Then I addet the document to the template.

Now I expected DOX to have learned from the corrections I made, but if I load exactly same document again, it still makes the same errors.

So, can anyone explain how DOX can learn and what setup influence the learning and what I can to to teach DOX?

Also I would like to understand, if I have 3 purchase orders with 3 very different layouts, can I handle this by adding the 3 documents to one template or must I create 3 templates or is it even better to create 3 different schemas?

I hope it is allright that I asked 3 questions in this thread as I think they are all related to the subject - how does DOX learn and fix extraction errors?

Kind Regards 
Thomas Madsen Nielsen

View Entire Topic
VarunTak
Advisor
Advisor

Hi, let me try to answer some of your questions:

Assumption: Based on your questions, it seems you are only trying the Template model in a Premium version, as you can see the Auto schemas.

1) Manual vs Auto:

Yes, a manual schema is only for the Template model, and the Auto can be a multi-model scenario depending on how you use it. On a premium instance, if you choose Auto schema, we include the Template model automatically. So, you don't really need to create a manual schema, unless you are just want to use the Template model.

Now, you can try following combination:

Auto schema + Default Extractors:

This prioritizes a pre-trained model to deliver the best results. However, if you provide feedback (correcting and confirming a document), it will switch to LLM-based extraction. If you still find the information incorrect and provide feedback again, a template will be automatically generated for you. Even after this, if you still see that the extraction is wrong, then it is likely that we have reached a limitation of the product, and it cannot be further improved. In this case, we study the issue on a case-by-case basis to determine if we can improve the extraction through backend training.

Auto Schema without Default Extractors:

In this case, we do not use any pre-trained model but use the LLM and Template model to deliver the results depending on if you provide feedback or not, as described above.

Lastly, regarding different layouts of documents: no, you can add more layouts in one Template; one Template is supposed to have only one layout of documents. The more documents you can add to provide more fields, but the layout should remain the same.

I hope it helps. thanks

 

 

TMNielsen
Contributor
0 Kudos

Hi VarunTak

Thanks for your reply. 

I feel that I still only understand a small fragment of the big picture, but with your answer, I understand a little better the documentation i found here:

14.2.5 Schema Configuration

Regards Thomas 

 

PS. I have the most problems with item lines. Especially if one item is covering more than one line in the document.

TMNielsen
Contributor
0 Kudos

I just closed this question when I discovered this in the documentation:

TMNielsen_0-1756364646613.png

I guess that is the main source of my problems.

Kind regards 
Thomas Madsen Nielsen