on 2025 Aug 25 1:36 PM
Hi
I trying to load some purchase orders with different layouts. I Need to extract at least one field that is not included in the SAP standard schema (EAN no.), so I made a copy of the SAP_purchaseOrder_schema and added the ean header field to the copy.
The new field can have setup type Manual or Auto (with a default extractor). As I understand the documentation, a custom field must have setup type Manual. Also I understand that only setup type Auto will use the AI functionality and if I chose Manual setup type, I must make a template. Does this mean that fields with Auto are handled by the AI and fields with Manual are depending on the template?
So I have made the new field with setup type Manual and I have made a template.
Then I uploaded a file. Some data was extracted wrongly so I entered Edit mode and corrected the errors and saved the document again. Then I addet the document to the template.
Now I expected DOX to have learned from the corrections I made, but if I load exactly same document again, it still makes the same errors.
So, can anyone explain how DOX can learn and what setup influence the learning and what I can to to teach DOX?
Also I would like to understand, if I have 3 purchase orders with 3 very different layouts, can I handle this by adding the 3 documents to one template or must I create 3 templates or is it even better to create 3 different schemas?
I hope it is allright that I asked 3 questions in this thread as I think they are all related to the subject - how does DOX learn and fix extraction errors?
Kind Regards
Thomas Madsen Nielsen
Request clarification before answering.
Hi, let me try to answer some of your questions:
Assumption: Based on your questions, it seems you are only trying the Template model in a Premium version, as you can see the Auto schemas.
1) Manual vs Auto:
Yes, a manual schema is only for the Template model, and the Auto can be a multi-model scenario depending on how you use it. On a premium instance, if you choose Auto schema, we include the Template model automatically. So, you don't really need to create a manual schema, unless you are just want to use the Template model.
Now, you can try following combination:
Auto schema + Default Extractors:
This prioritizes a pre-trained model to deliver the best results. However, if you provide feedback (correcting and confirming a document), it will switch to LLM-based extraction. If you still find the information incorrect and provide feedback again, a template will be automatically generated for you. Even after this, if you still see that the extraction is wrong, then it is likely that we have reached a limitation of the product, and it cannot be further improved. In this case, we study the issue on a case-by-case basis to determine if we can improve the extraction through backend training.
Auto Schema without Default Extractors:
In this case, we do not use any pre-trained model but use the LLM and Template model to deliver the results depending on if you provide feedback or not, as described above.
Lastly, regarding different layouts of documents: no, you can add more layouts in one Template; one Template is supposed to have only one layout of documents. The more documents you can add to provide more fields, but the layout should remain the same.
I hope it helps. thanks
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi VarunTak
Thanks for your reply.
I feel that I still only understand a small fragment of the big picture, but with your answer, I understand a little better the documentation i found here:
Regards Thomas
PS. I have the most problems with item lines. Especially if one item is covering more than one line in the document.
| User | Count |
|---|---|
| 18 | |
| 7 | |
| 6 | |
| 6 | |
| 4 | |
| 4 | |
| 3 | |
| 2 | |
| 2 | |
| 2 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.