Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
1,713

To best follow this post and try things out yourself, you should:

    1. Have some basic knowledge of the Python programming language

 

    1. Have a data lake Instance provisioned and configured. For instructions on how to provision an instance one can refer to the following - Configure the SAP HANA data lake File Container. Managing Data Lake Files | SAP Help Portal

 

    1. Have the REST API Endpoint and Instance ID for your data lake Instance handy.

 

    1. Have access to a Jupyter notebook.



Overview:

In this blog, we will learn how to use the SAP HANA data lake REST API to Create/Write, Access/Read and list your files through a python script. The REST API reference documentation link can be found at (SAP HANA Cloud, Data Lake Files REST API), and it may be used to access the file containers of the SAP HANA data lake. The Python demonstrations that follow, however, use some of the most typical endpoints. We will learn how to use a Python http client to fire a http request and then parse a response status and get response body data. In this post on python http module, we will try attempting making connections and making http requests like GET, POST, PUT, DELETE. Let’s get started.


 

Step 1: Making a http connection to data lake

 

Copy and paste your client.key and client.crt in the relative path where your python script is placed (Home of the Jupyter Notebook).

 

The first step over here is to import http.client package for making HTTP requests and will set some commonly re-used variables for the API calls. Add the following at the top of a Python script and populate the variables with the proper information for your SAP HANA data lake file container.

We use the http client to get a response and a status from the URL (i.e., FILES REST API)

 

The code will validate the client certificate and client key too. So, I would recommend to moving the certs into the same directory as your script and just use a relative path.

 

use "./<certname.crt>" as the filepath when they are in the same directory as your script.

use "./<keyname.key>" as the filepath when they are in the same directory as your script.

 


 

Step 2: Write/Create a file to the data lake File Store

To Write/Create a file into the data Lake File Store, we will use the PUT request method supported by http.

 

PUT http Method

    • PUT requests are used to change data on the server. It replaces the entire content at a specific location with data from the body payload. If no resources match the request, one will be generated.

 

    • The PUT method requests that the enclosed entity be stored under the supplied URI. If the URI refers to an already existing resource, it is modified and if the URI does not point to an existing resource, then the server can create the resource with that URI.



The following code sets up the API call to the CREATE endpoint and will upload a file to the folder specified in your SAP HANA data lake File Store.

 


 

The above code will create a file in the data lake File Store under the “test” directory as “MYFIRSTAPIFILE” and the message will be given under the parameter file = “Welcome to the SAP blog about Accessing SAP HANA Cloud, data lake Files from Python”.

 

DBX Screenshot


 

 

Step 3: Access/Read the file that was created above in the data lake File Store

 

To Access/Read a file from the data Lake File Store, we will use the GET request method supported by HTTP.

 

The GET() method sends a GET request to the specified url. GET request is the most common method and is used to obtain the requested data from the specific server.

The get method will display the contents of the file mentioned in the file path (f_path)

 

The following code sets up the API call to the OPEN endpoint and will print your file contents.


 

Output will be:


 

One can also download the file from DBX and open the file content in notepad


 

Step 4: Print the list of directories/files in your data lake File Store

 

The GET() method is also used to specify the directories and all the files present within those directories.

Under the file path (f_path) we need to mention a “/” which means that the GET request will fetch all the directories and the files within those directories, which are present in the data lake File Store.

 

The following code sets up the API call to the LISTSTATUS endpoint and will print the list of directories and files in your SAP data lake File Store.

 

And it will also display the type of the contents. i.e., whether it’s a file or a directory.

 


 

The output will be:


 

The following code sets up the API call to the LISTSTATUS_RECURSIVE endpoint and will print the list of directories and files in your SAP HANA Cloud, data lake File storage.

 


 

Output:


 

 

Step 5: Delete the entire directory and its files contents from your data lake File Store

The Delete () method is used to delete the entire directory and its files contents within a data lake File Store.

 

The following code sets up the API call to the DELETE endpoint and will delete the entire directory and files mentioned in the file path, from your SAP data lake File Store.

 

Under the f_path we need to mention the exact file path of the file or the directory we wish to delete.

 

Please see the below code sample.

 


 

The above code will delete the “test” directory as well. Since, it has only one file “MYFIRSTAPIFILE” inside the “test” folder.

 


 

Now let us create another file within the test folder and run the Delete () operation.

 


 

The above code block will Create a 2nd file “MYSECONDAPIFILE” in the same “test” directory folder.


 

The above code will Read the “MYSECONDAPIFILE” and its output is displayed.

 

DBX Screenshot before Delete () operation:


 

The following code will delete the “MYSECONDAPIFILE” that was created in the file container.

 


 

DBX Screenshot:

 


 

 

The entire code for Accessing HANA Cloud, data lake Files from Python

 

import http.client 

import warnings 

warnings.filterwarnings("ignore", category=DeprecationWarning) 

import csv 



FILES_REST_API='<REST API ENDPOINT>' 

CONTAINER = '<INSTANCE ID>' 

CRT_PATH = './client.crt' 

KEY_PATH= './client.key' 

 


 

 

#-- Write/Create a directory and a file within that directory, to the data lake File Store



place = '/test/'

file_name = 'MYSFIRSTAPIFILE'

file = 'Welcome to the SAP blog about Accessing SAP HANA Cloud, data lake Files from Python'

request_url = '/webhdfs/v1/' + place + file_name + '?op=CREATE&data=true'

request_headers = {

    'x-sap-filecontainer': CONTAINER,

    'Content-Type': 'application/octet-stream'

}

conn = http.client.HTTPSConnection(FILES_REST_API, port=443, key_file=KEY_PATH, cert_file=CRT_PATH)

conn.request(method="PUT", url=request_url, body=file, headers=request_headers)

response = conn.getresponse()

response.close()


 

 

 

# -- Will print the list of directories/files in your data lake File Store



f_path = '/'

request_url=f'/webhdfs/v1/{f_path}?op=LISTSTATUS_RECURSIVE'

request_headers = {

    'x-sap-filecontainer': CONTAINER,

    'Content-Type': 'application/json'

}



conn = http.client.HTTPSConnection(FILES_REST_API, port=443, key_file=KEY_PATH, cert_file=CRT_PATH)

conn.request(method="GET", url=request_url, body=None, headers=request_headers)

response = conn.getresponse()

print(response.read())

response.close()


 

# -- Will delete the entire directory and its files content



f_path = '/test/MYSECONDAPIFILE'

request_url=f'/webhdfs/v1/{f_path}?op=DELETE'

request_headers = {

    'x-sap-filecontainer': CONTAINER,

    'Content-Type': 'application/json'

}

conn = http.client.HTTPSConnection(FILES_REST_API, port=443, key_file=KEY_PATH, cert_file=CRT_PATH)

conn.request(method="DELETE", url=request_url, body=None, headers=request_headers)

response = conn.getresponse()

print(response.read())

response.close()


 

 

# -- Will upload a local file to the data lake files storage



f_path = 'C:/Users/I567343/OneDrive - SAP SE/Documents/REST API Blog/Orders'

request_url=f'/webhdfs/v1/{f_path}?op=APPEND'

request_headers = {

    'x-sap-filecontainer': CONTAINER,

    'Content-Type': 'application/json'

}

conn = http.client.HTTPSConnection(FILES_REST_API, port=443, key_file=KEY_PATH, cert_file=CRT_PATH)

conn.request(method="POST", url=request_url, body=None, headers=request_headers)

response = conn.getresponse()

print(response.read())

response.close()


 

 

 

Conclusion: That’s how one can easily perform the following tasks using the SAP HANA Cloud, data lake files REST API:

    • Create a new file in HDLFS.

 

    • Upload a file from the relative path or from your local machine to HDLFS.

 

    • Read and download an existing file form HDLFS.

 

    • List the set of files in a specific path in HDLFS.

 

    • Delete a file from HDLFS.



 

Would love to read any suggestions or feedbacks on the blog post. Please do give a like if you found the information useful also feel free to follow me to get information on similar content.

Request everyone reading the blog to also go through the following links for any further assistance. 

SAP HANA Cloud, data lake — post and answer questions here,

and read other posts on the topic you wish to discover here