Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
JoseMunoz
Active Participant
6,285
Dear community,

 

In this blog I want to show the posibility to extract information from a document using AI and OCR implemented by the BTP service Document Information Extraction calling the API offered from an ABAP program.

 

The objective of this blog is not to show how the API works as there are good blogs showing it ( Getting Started with Document Information Extraction Trial Service  or Developer Mission) but to show how you can automate the API calls with only ABAP program. I say only abap as there are already another integration scenarios in CPI ( Document Information Extraction Integration with Email Server ) or with iRPA but here we will see a simple solution.

Here the architecture:


The API calls you need to perform to send a file and receive the results are:

  1. Authenticate

  2. Send File

  3. Get job status, if job is still processing the document, wait until it's done

  4. Get JSON with fields extracted


 

If you want to test this solution you have to create the Document Information Extraction Service Instance, plese follow this blog from joni.liu

 

You need to create a destination in SM59 for the authentication:


Host: d7d51f5atrial.authentication.us10.hana.ondemand.com

Port: 443

User: <clientid from instance service key>

Pass: <clientsecret from instance service key>

 

 

And here you have the program that requires a pdf file, it will send the file requesting the fields documentNumber, purchaseOrderNumber and grossAmount and wait for the response. After getting the json it will write the values read by the service.

 
*&---------------------------------------------------------------------*
*& Report ZTEST_DOCUMENT_INFORMATION_EXT
*&---------------------------------------------------------------------*
*& PoC - Sends a file to Document Information Extraction BTP Service
*& Reads te file from Desktop and sends through API
*&---------------------------------------------------------------------*
REPORT ztest_document_information_ext.



CLASS zcl_die DEFINITION DEFERRED.

TYPES: BEGIN OF ty_filetab,
value TYPE x,
END OF ty_filetab.

DATA lr_die TYPE REF TO zcl_die.
DATA: lv_file_name TYPE string,
lv_rc TYPE i,
lt_file TYPE STANDARD TABLE OF ty_filetab,
lv_file_content TYPE xstring,
lt_filetable TYPE filetable.



PARAMETERS: p_fname TYPE rlgrap-filename.



AT SELECTION-SCREEN ON VALUE-REQUEST FOR p_fname.

CALL METHOD cl_gui_frontend_services=>file_open_dialog
EXPORTING
window_title = 'Choose a file'
file_filter = 'PDF files (*.pdf)|*.pdf|'
CHANGING
file_table = lt_filetable
rc = lv_rc.

p_fname = lt_filetable[ 1 ]-filename.


**********************************************************************
* Document Information Extraction class definition
CLASS zcl_die DEFINITION FINAL.

PUBLIC SECTION.
CONSTANTS: c_api_url TYPE string VALUE 'https://aiservices-trial-dox.cfapps.us10.hana.ondemand.com',
c_api_path TYPE string VALUE '/document-information-extraction/v1'.

DATA:
m_oauth TYPE string,
m_content_clients TYPE string.

METHODS authenticate RETURNING VALUE(rv_authenticated) TYPE abap_bool..
METHODS post_document IMPORTING iv_file_content TYPE xstring
RETURNING VALUE(rv_job) TYPE string.
METHODS send_file IMPORTING iv_file_content TYPE xstring.
METHODS get_status_job IMPORTING iv_job TYPE string
RETURNING VALUE(rv_status_job) TYPE string.

ENDCLASS.


**********************************************************************
* Document Information Extraction class implementation
CLASS zcl_die IMPLEMENTATION.

METHOD authenticate.

DATA lr_client TYPE REF TO if_http_client.

CALL METHOD cl_http_client=>create_by_destination
EXPORTING
destination = 'ZBTP_DOC_INF_EXT_OAUTH2'
IMPORTING
client = lr_client
EXCEPTIONS
argument_not_found = 1
destination_not_found = 2
destination_no_authority = 3
plugin_not_active = 4
internal_error = 5
OTHERS = 6.
IF sy-subrc = 0.

* If you have the class cl_oauth2_client in your system check note 3041322 or use following method
lr_client->request->set_header_field( name = if_http_header_fields_sap=>request_method value = 'POST' ).
lr_client->request->set_header_field( name = 'grant_type' value = 'client_credentials' ).
lr_client->request->set_header_field( name = if_http_header_fields_sap=>request_uri value = '/oauth/token?grant_type=client_credentials' ).
lr_client->send( ).
lr_client->receive( ).

lr_client->response->get_status(
IMPORTING
code = DATA(lv_code) ).

IF lv_code = '200'.

DATA: rest TYPE string.

DATA(l_content) = lr_client->response->get_cdata( ).
SPLIT l_content AT '"access_token":"' INTO rest l_content.
SPLIT l_content AT '"' INTO m_oauth rest.

rv_authenticated = abap_true.

ELSE.
rv_authenticated = abap_false.
ENDIF.

lr_client->close( ).

ENDIF.

ENDMETHOD.


METHOD post_document.


DATA lr_client TYPE REF TO if_http_client.
DATA lo_request_part TYPE REF TO if_http_entity.
DATA lo_request_part2 TYPE REF TO if_http_entity.
DATA lv_content_disposition TYPE string.
DATA len TYPE i.
DATA lv_options TYPE string.

DATA: BEGIN OF ls_create_job_response,
id TYPE string,
status TYPE string,
processedtime TYPE string,
END OF ls_create_job_response.

CLEAR rv_job.

CALL METHOD cl_http_client=>create_by_url
EXPORTING
url = c_api_url
IMPORTING
client = lr_client
EXCEPTIONS
argument_not_found = 1
plugin_not_active = 2
internal_error = 3
OTHERS = 4.

IF sy-subrc = 0.


lr_client->request->set_header_field( name = if_http_header_fields_sap=>request_method value = if_http_request=>co_request_method_post ).
lr_client->request->set_header_field( name = if_http_header_fields_sap=>request_uri value = |{ c_api_path }/document/jobs| ).
lr_client->request->set_header_field( name = 'Authorization' value = |Bearer { m_oauth }| ).
lr_client->request->set_content_type( if_rest_media_type=>gc_multipart_form_data ).
lr_client->request->if_http_entity~set_formfield_encoding( formfield_encoding = cl_http_request=>if_http_entity~co_encoding_raw ).

lr_client->request->set_header_field( name = 'Accept' value = if_rest_media_type=>gc_appl_json ).


lo_request_part2 = lr_client->request->add_multipart( ).


lv_options = '{ "extraction": { "headerFields": [ "documentNumber", "purchaseOrderNumber", "grossAmount" ], "lineItemFields": [ "netAmount" ] },' &&
'"clientId": "default", "documentType": "invoice", "receivedDate": "2020-02-17", "enrichment": { "sender": { "top": 5, "type": ' &&
'"businessEntity", "subtype": "supplier" }, "employee": { "type": "employee" } }}'.
lo_request_part2->set_header_field( name = `Content-Disposition` "#EC NOTEXT
value = |form-data; name="options"; type=application/json| ).
lo_request_part2->set_cdata(
EXPORTING
data = lv_options ).




lo_request_part = lr_client->request->add_multipart( ).
lv_content_disposition = |form-data; name="file"; filename=sample-invoice.pdf |.
lo_request_part->set_header_field( name = `Content-Disposition` "#EC NOTEXT
value = lv_content_disposition ).
lo_request_part->set_content_type( if_rest_media_type=>gc_appl_pdf ).

len = xstrlen( iv_file_content ).

lo_request_part->set_data( data = lv_file_content offset = 0 length = len ).

lr_client->send( ).
lr_client->receive( ).

DATA(l_content_clients) = lr_client->response->get_cdata( ).
/ui2/cl_json=>deserialize( EXPORTING json = l_content_clients pretty_name = /ui2/cl_json=>pretty_mode-camel_case CHANGING data = ls_create_job_response ).


lr_client->response->get_status(
IMPORTING
code = DATA(lv_code) ).

IF lv_code = '201'.
rv_job = ls_create_job_response-id.
ENDIF.

lr_client->close( ).

ENDIF.

ENDMETHOD.



METHOD get_status_job.


DATA lr_client TYPE REF TO if_http_client.
DATA lv_status_job TYPE string.
DATA l_json_response TYPE string.
DATA: lr_data TYPE REF TO data.

CLEAR rv_status_job.

CALL METHOD cl_http_client=>create_by_url
EXPORTING
url = c_api_url
IMPORTING
client = lr_client
EXCEPTIONS
argument_not_found = 1
plugin_not_active = 2
internal_error = 3
OTHERS = 4.

IF sy-subrc = 0.


lr_client->request->set_header_field( name = if_http_header_fields_sap=>request_method value = if_http_request=>co_request_method_get ).
lr_client->request->set_header_field( name = if_http_header_fields_sap=>request_uri value = |{ c_api_path }/document/jobs/{ iv_job }| ).
lr_client->request->set_header_field( name = 'Authorization' value = |Bearer { m_oauth }| ).

lr_client->send( ).
lr_client->receive( ).

l_json_response = lr_client->response->get_cdata( ).
/ui2/cl_json=>deserialize( EXPORTING json = l_json_response pretty_name = /ui2/cl_json=>pretty_mode-camel_case CHANGING data = lr_data ).

lr_client->response->get_status(
IMPORTING
code = DATA(lv_code) ).

IF lv_code = '200'.

/ui2/cl_data_access=>create( ir_data = lr_data iv_component = `STATUS`)->value( IMPORTING ev_data = lv_status_job ).

IF lv_status_job = 'DONE'.

DATA: l_field_name TYPE string,
l_value TYPE string,
i TYPE i.
i = 1.
WHILE i < 4.

/ui2/cl_data_access=>create( ir_data = lr_data iv_component = |EXTRACTION-HEADER_FIELDS[{ i }]-NAME| )->value( IMPORTING ev_data = l_field_name ).

/ui2/cl_data_access=>create( ir_data = lr_data iv_component = |EXTRACTION-HEADER_FIELDS[{ i }]-VALUE| )->value( IMPORTING ev_data = l_value ).

WRITE:/ l_field_name, l_value.

i = i + 1.

ENDWHILE.

rv_status_job = lv_status_job.

ENDIF.
ELSE.
rv_status_job = 'FAILED'.
ENDIF.

lr_client->close( ).

ENDIF.

ENDMETHOD.

METHOD send_file.

DATA: l_job TYPE string,
l_status_job TYPE string.

l_job = lr_die->post_document( iv_file_content ).
* l_job = '1ad442aa-46dc-4e84-8344-d024ec516a18'.
IF l_job IS NOT INITIAL.

l_status_job = lr_die->get_status_job( iv_job = l_job ).

WHILE l_status_job <> 'DONE' AND l_status_job <> 'FAILED'.
WAIT UP TO 3 SECONDS.
l_status_job = lr_die->get_status_job( iv_job = l_job ).
ENDWHILE.

ENDIF.

ENDMETHOD.

ENDCLASS.


START-OF-SELECTION.


IF p_fname IS NOT INITIAL.

* Covert file to binary format
CALL METHOD cl_gui_frontend_services=>gui_upload
EXPORTING
filename = CONV #( p_fname )
filetype = 'BIN'
IMPORTING
filelength = DATA(lv_input_len)
CHANGING
data_tab = lt_file.


* convert file to XSTRING
CALL FUNCTION 'SCMS_BINARY_TO_XSTRING'
EXPORTING
input_length = lv_input_len
IMPORTING
buffer = lv_file_content
TABLES
binary_tab = lt_file.



lr_die = NEW zcl_die( ).

IF lr_die->authenticate( ) = abap_true.

lr_die->send_file( lv_file_content ).

ENDIF.


ENDIF.

 

For testing we can use the following invoice  from missions. If we run the program with that pdf, after some seconds you have the following output

 


 

We can verify in the Document Information Extraction UI that the extracted that is correct.


 

With that you can automate the process of scanning documents like invoices, check if it has purchase order number to match the infoice with purchase order, and many other options just in an ABAP program.

 

Best Regards

Jose Muñoz
10 Comments
Labels in this area