Showing results for 
Search instead for 
Did you mean: 

Applying Schema and Extracting Each Page Individually

0 Kudos

Document Information Extraction is a great tool and will really benefit the automation community.

Being in the AP Automation and PO Automation world for many years we know that customers do all types of crazy things that can mess up our automation and therefore we must design for this as much as possible.

My question / feature, is there a way that I can send for example a 3 page PDF document and get back all 3 pages with the schema applied to all 3 pages individually? I need this because I have to normalize what was sent to me. In this case a client sent 1 document and each page in the document was a single order.

I need to get back all three pages individually and then I can compare PO Numbers to ensure that all of the pages belong to the same order.

I have already tested this scenario in the system and I got back 1 result with line items from all 3 orders and header information from the second order.

Any thoughts or suggestions would be greatly appreciated as well.

Accepted Solutions (1)

Accepted Solutions (1)

0 Kudos

The easiest solution would be to write some custom code for splitting the PDF into pages and then uploading it page by page. There should be different libraries available, such as this one (Python):

from PyPDF2 import PdfWriter, PdfReader

inputpdf = PdfReader(open("document.pdf", "rb"))

for i in range(len(inputpdf.pages)):
    output = PdfWriter()
    with open("document-page%s.pdf" % i, "wb") as outputStream:

In the future, we also plan to offer splitting capabilities as part of the product, in case you would like to raise this as a feature request, you can make use of the influence program:

Best regards, Tobias

Answers (0)