Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
lucia_wu
Advisor
Advisor
3,633

What is Data Lake Files?

 

Data lake Files is a component of SAP HANA Cloud that provides secure, efficient storage for large amounts of structured, semi-structured, and unstructured data. Data lake Files is automatically enabled when you provision a data lake instance.

 

Provisioning creates the data lake Files container as a storage location for files. This file store lets you use data lake as a repository for big data. For more information on provisioning data lake Files with your data lake instance, see Creating SAP HANA Cloud Instances.


 

Configuration the File Container


I will introduce the step by Rest Api.

    1. Create HANA DB on BTP with Data Lake

 

    1. Note that the storage service type selects SAP Native

 

    1. Go to the SAP HANA Cloud on BTP, click Data Lake instance Actions -> Open SAP HANA Cloud Central

 

    1. Next, please configure the file container like this URL -> Setting Up Initial Access to HANA Cloud data lake Files



Okay, the data lake file configuration is complete.

 

Using the File Container


We can start to fetch or upload files through the Rest API.

    • Copy the instance ID and execute the following cmd command in the authorized folder locally.



Get list status:

curl --insecure -H "x-sap-filecontainer: {{instance-id}}" --cert ./client.crt --key ./client.key "https://{{instance-id}}.files.hdl.canary-eu10.hanacloud.ondemand.com/webhdfs/v1/user/home/?op=LISTSTATUS" -X GET


You will see:




    • Upload file please execute the command
      curl --location-trusted --insecure -H "Content-Type:application/octet-stream" -H "x-sap-filecontainer: {{instance-id}}" --cert ./client.crt --key ./client.key --data-binary "@Studies.csv" "https://{{instance-id}}.files.hdl.canary-eu10.hanacloud.ondemand.com/webhdfs/v1/user/home/Studies.csv?op=CREATE&data=true&overwrite=true" -X PUT​

 

    • Now get the list status again, you can see the file just uploaded





 

Read the contents of the file into the DB table


Go to the SAP HANA database explorer and open the account.

Note that in this step, you must ensure that the table fields in the database are the same as those in the csv file.

The IQ table I use here to load the data, please refer to

 

CALL SYSHDL_BUSINESS_CONTAINER.REMOTE_EXECUTE('

LOAD TABLE MANAGEMENT_STUDIES

(status_code,study_num,description,study_ID,protocol_ID,lastSubjectLastVisit,isLeanStudy,studyPhase,ID) 

FROM ''hdlfs:///user/home/archiving/Studies.csv'' 

format csv 

SKIP 1 

DELIMITED BY '','' 

ESCAPES OFF' );


 

You can use data lake file to save some unstructured data, or to storage some archiving files, which seems to be a new good choice besides object store and AWS, etc.

3 Comments
Alagan
Explorer
0 Kudos
Hi Lucia Wu,

Thanks for your explanation on data lake on files. I recreated the steps mentioned above and face unauthorized error in the final step (i.e. loading data into the IQ table). Can you please help me with it.
RobertWaywell
Product and Topic Expert
Product and Topic Expert
0 Kudos
Did you have a table created in the HDL Relational Engine database called "MANAGEMENT_STUDIES"?

 

This blog didn't show the step of creating the database table itself, rather it just gave an example of the LOAD TABLE syntax that could be used to load the data from HDLFS assuming that the destination table already existed in the database.
0 Kudos
Hi

 

Thanks for the overview. How can we analyze unstructured data in SAP HANA Cloud ? For e.g. video files, avi files and pdf files.

 

Regards

Vinieth