on 2024 Jun 14 3:25 PM
Hello.
We are currently using SAP Document AI to read scanned PDF files with the SAP_OCROnly_schema document schema.
The OCR works great and we get the content using the Get All Pages Text API or the .get_document_text method in python, but the results are a JSON file with bounding boxes and the corresponding text.
Is there a way to get the whole text from the document (or page by page) instead of for each individual bounding box?
Thanks,
Manuel T.
Request clarification before answering.
Hi Manuel,
We only support the JSON output as of now, the easiest solution would be to write a small script that converts the JSON output to pure text.
You can also use the influence program to raise feature requests towards the product: https://influence.sap.com/sap/ino/#campaign/3667
Best regards,
Tobias
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks Tobias.
I have raised the request, I think it will be an improvement, specially for generative AI processing.
| User | Count |
|---|---|
| 18 | |
| 7 | |
| 6 | |
| 6 | |
| 6 | |
| 4 | |
| 3 | |
| 3 | |
| 2 | |
| 2 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.