Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
werner_daehn
Active Contributor

Design goal


When dealing with lots of data, there are different concepts. SAP favors a multi-temperature storage inside the database but the majority of projects use an object storage - in case of Microsoft Azure that would be the Azure Data Lake. It is cheap, can grow to virtually any size, is secure and can handle all types of data.

To make this data accessible via Hana also, a new SDI adapter is needed. I wrote one that provides access to the data in Azure Data Lake and Azure Service Bus. With this adapter the data can be used by Hana onPrem, Hana Cloud, SAP Hana Data Warehouse Cloud, S/4Hana or SAP BW.

Regarding the payload formats supported, in Azure Avro and Json are popular formats. In below example the Data Lake contains millions of files, each representing one patient record in the FHIR standard. Whenever new files are uploaded to Azure, the Azure Event Grid posts this information into a Service Bus queue which the adapter is listening on. But the adapter works with any Json or Avro format, no limitations.

My sample file in Azure looks like this
{
"resourceType": "Patient",
"id": "example",
"identifier": [
{ "use": "usual", "system": "urn:oid:1.2.36.146.595.217.0.1" }
],
"active": true,
..
..

and all files should be queried via Hana like any other table.



 

Requirements


 

This creates a couple of requirements for this Data Lake adapter:

  • Parse Json and Avro files.

  • As these file formats allow for nested data, a mechanism to blend the nested Json/Avro data into the relational Hana data model must exist.

  • When files are changed, deleted or new files arrive they must be visible in Hana right away.

  • One Data Lake storage has different kind of files, each with their unique structure, which must be treated independently.


The solution is is using the concept of "views" in the Azure Data Lake adapter to define transformation from the nested data into Hana tables. For above example the user does create a file named patient.view with the following content:
{
"namefilter": "patient.*\\.json",
"from": "$",
"columns": [
{
"accessor": "$.id",
"alias": "ID",
"datatype": "VARCHAR(256)"
},
{
"accessor": "$.identifier[0].system",
"alias": "SYSTEM_IDENTIFIER",
"datatype": "VARCHAR(40)"
},
{
"accessor": "$.gender",
"alias": "GENDER",
"datatype": "VARCHAR(10)"
},
{
"accessor": "$.birthDate",
"alias": "BIRTHDATE",
"datatype": "DATE",
"format": "yyyy-M-d"
}
],
"queuename": "filechanges"
}

Apart from the view projection definitions (from and columns) with their information about:

  • What nesting level is driving the row generation? from: $ means the root level, so one row per file.

  • Which field to output?

  • What should be its Hana column name?

  • What should be its Hana data type?

  • What is the format of the value in case a data type conversion is needed, e.g. from String to Date?


a name and directory filter can be specified also. Because this format definition makes sense for json files containing patient data only and all these files follow the pattern patient*.json in the Data Lake storage, a namefilter is used. Hence this view should consider only those files.

The last setting in above example is the Enterprise Service Bus ("ESB") queue name where Azure puts all file changes. This is another feature of the Azure Data Lake. It allows to trigger events whenever a file is modified and using Azure Event Grid, this event can be put into an ESB queue.

 

 

Consuming Azure Data Lake data


This view is imported as virtual table just like with any other SDI adapter, either via SQL or via the various UIs. As the definition file is called patient.view its remote name is patient and the chosen name v_patient. In above screenshot this table was selected and it returned the data contained in the files.


 

Creating the Hana remote source object


The remote source definition asks for the URL of the Data Lake endpoint and optionally - if realtime subscriptions will be used also - for the Enterprise Service Bus connection string. Both values can simply be copied from Azure.

 

As a Storage Account in Azure can have multiple containers, the file system is a mandatory parameter and it can be augmented with additional sub directories. For example the setting /main/FHIR configures this remote source to read only data from the file system called "main" and within the "FHIR" directory and sub directories only.

The security keys for the Storage Account and the Service Bus are placed into the secure credential storage.

The only additional parameter needed is a Adapter-local directory where all view definition files are placed by the user.


 

Realtime Subscriptions


Using Hana SDI realtime subscriptions, changes in the Data Lake are captured and the target table in Hana gets updated with the new data immediately.
CREATE COLUMN TABLE t_patient LIKE v_patient;

CREATE REMOTE SUBSCRIPTION s_patient
ON v_patient TARGET TABLE t_patient;

ALTER REMOTE SUBSCRIPTION s_patient QUEUE;

INSERT INTO t_patient select * from v_patient; // Initial load

ALTER REMOTE SUBSCRIPTION s_patient DISTRIBUTE;

SELECT * FROM t_patient; // Returns old and changed file contents

 


 

The corresponding setup in Azure is very easy to achieve. All that needs to be done is to define a new Event in the File System and use a previously created Service Bus queue as end point.


 

Summary


With that not only can all Azure Data Lake data be read in a federated mode but also the data be cached in Hana tables for best query performance and this data is current always - thanks to the realtime subscriptions. These changes are even transactional consistent.

 

 
45 Comments
MarioDeFelipe
Contributor
Thanks for sharing, I believe we will see an increasing interest in real-time integration, so the more we know about this, the better we'll get there.
former_member532376
Participant
0 Kudos
Hi Werner,

great blog!

If one wants to establish this scenario with SAP HANA Cloud and Azure Data Lake, I guess there is still the DP Agent needed? In this case it would not make much sense to go from Cloud to On- premise to Cloud again. Therefore: Is it possible to install DP-Agent on a VM in Azure?

Regards,
Michael
werner_daehn
Active Contributor
0 Kudos
mikaufma Yes, of course. Any small Linux VM will do.

Would be nice if each adapter can be packaged as docker image with dpagent included already, wouldn't it? I will make the suggestion to SAP.
SimonVee
Explorer
Great find & great use-case!

 

I don't find any mention of this adapter when looking into the PAM documentation or Installation guide for SDI. Is there any documentation on this adapter?
werner_daehn
Active Contributor
0 Kudos

svandelanotte If you can send me an email and I will grant you access to the github repo.

Note: This is not a SAP product, I wrote the adapter for a customer and others can participate on that.

S0019642851
Explorer
0 Kudos
Dear Werner

we have a use case as well for the Azure Data Lake, is it possible to have the AzureDataLake adapter in HANA SDI ?

regards

Harikishore
werner_daehn
Active Contributor
0 Kudos

aarini If you send me an email I will grant you access to the github repo.

ugandhar_nandam1
Explorer
0 Kudos
Werner

We have a use case as well for the Azure Data Lake, is it possible to have the AzureDataLake adapter in HANA SDI? My email id: ugandhar.nandam@gmail.com
0 Kudos
Hi Werner ,

We have similar use case for Azure Data Lake adapter in HANA SDI . Could you please provide the installation reference. my mail id (needara2021@gmail.com)
mivanows
Explorer
0 Kudos

Dear wdaehn

We need help with the code to have an adapter for AZURE Data Lake in SAP Commissions on HANA.

my email marki@asardigital.com

 

mivanows
Explorer
0 Kudos
We need help with the code to have an adapter for AZURE Data Lake in SAP Commissions.

my email marki@asardigital.com
aristobulo_cruz
Explorer
0 Kudos
Hi, please share the script for creating the remote source, acruzto2002@gmail.com

Regards.
werner_daehn
Active Contributor
0 Kudos
aristobulo.cruz I will bring the documentation up to standard in the next weeks.

 
CREATE REMOTE SOURCE "xxxxx" ADAPTER "AzureSQLDB"
AT LOCATION AGENT "yyyyy" CONFIGURATION
'<?xml version="1.0" encoding="UTF-8"?>
<ConnectionProperties name="azuredb">
<PropertyEntry name="jdbcurl">jdbc:sqlserver://zzzzz.database.windows.net:1433;database=rtditest1;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;</PropertyEntry>
<PropertyEntry name="jdbcurlrt"></PropertyEntry>
<PropertyEntry name="pollsleep"></PropertyEntry>
</ConnectionProperties>'
WITH CREDENTIAL TYPE 'PASSWORD' USING
'<CredentialEntry name="credential">
<user>aaaaaa</user>
<password>bbbbbb</password>
</CredentialEntry>';
aristobulo_cruz
Explorer
0 Kudos

thanks Werner, you make other step before for creating the adapter?

 

CREATE REMOTE SOURCE "AzureDataLakeAdapter" ADAPTER "AzureSQLDB"
AT LOCATION AGENT "DPA_HNPRD" CONFIGURATION
'<?xml version="1.0" encoding="UTF-8"?>
<ConnectionProperties name="azuredb">

 

Error: (dberror) [474]: invalid adapter name: AzureSQLDB:

 

 

werner_daehn
Active Contributor
0 Kudos
Yes, there are the usual SDI steps to create an agent, adapter and adapter location.

 

see the first two commands here:

https://blogs.sap.com/2015/08/20/hana-adapter-sdk-interaction-via-sql/

 

The adapter name of AzureSQLDB is fixed, you want to enable this exact adapter.
aristobulo_cruz
Explorer
0 Kudos
Hi Werner, yes I can do it, but if the name adapter ist'n register into agent, doesn't works the creation script adapter.

CREATE ADAPTER "AzureSQLDB" PROPERTIES 'display_name=Azure Adapter;description=Azure Adapter' AT LOCATION DPSERVER;

 

Could not execute 'CREATE ADAPTER "AzureSQLDB" PROPERTIES 'display_name=Azure Adapter;description=Azure Adapter' AT ...'
Error: (dberror) [403]: internal error: Cannot get adapter capabilities: exception 151002: Adapter type AzureSQLDB not registered.
werner_daehn
Active Contributor
0 Kudos

You have created an agent called DPAGENT that points to the dpagent installation where the adapter is deployed, yes?

 

In that case, the SQL to execute would be

CREATE ADAPTER "AzureSQLDB" AT LOCATION AGENT "DPAGENT";

This contacts the DPAGENT and queries the metadata of the AzureSQLDB adapter, which was deployed there by you (via the agentconfig tool). I am a bit confused because in the create-remote-source you used as agent location "DPA_HNPRD".

werner_daehn
Active Contributor
0 Kudos
And I made a mistake as well. Are we talking about AzureDataLake or AzureSQLDB. The first is about files in the Azure Blob Storage, the second a SQL Server Database.

Above you wrote in one SQL statement AzureSQLDB and that's the first I saw.

The AzureDataLakeAdapter has a quite comprehensive documentation, I thought.

e.g. https://github.com/rtdi/HanaSDIAzureDataLakeAdapter/blob/master/docs/Installation.md

 

 
CREATE REMOTE SOURCE "DATALAKE_RTDI" ADAPTER "AzureDataLakeAdapter"
AT LOCATION AGENT "desktop" CONFIGURATION
'<?xml version="1.0" encoding="UTF-8"?>
<ConnectionProperties name="azuredatalake">
<PropertyEntry name="DataLakeEndpoint">https://xxxx.dfs.core.windows.net/</PropertyEntry>
<PropertyEntry name="RootDir">/main</PropertyEntry>
<PropertyEntry name="DefinitionDirectory">c:\temp\ADL\views</PropertyEntry>
<PropertyEntry name="ServiceBusConnectionString">Endpoint=sb://yyyyy.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey</PropertyEntry>
</ConnectionProperties>'
WITH CREDENTIAL TYPE 'PASSWORD' USING
'<CredentialEntry name="StorageKey">
<password>bbbb</password>
</CredentialEntry>
<CredentialEntry name="ServiceBusKey">
<password>aaaa</password>
</CredentialEntry>';
aristobulo_cruz
Explorer
0 Kudos
Hi Werner, I have a data provisioning agent DPA_HNPRD, I need the AzureDataLakeAdapter for a DEMO  with json file.

 

Please confirm the documentation I get error with this url:

 

https://github.com/rtdi/HanaSDIAzureDataLakeAdapter/blob/master/docs/Installation.md

 

so thanks.

 
werner_daehn
Active Contributor
0 Kudos
You understoof that this adapter is something I wrote, you must install in SDI and I must grant you access to the repository in github to download it?
0 Kudos
Hi Werner,

 

Thank you for this nice blog. I also have a use case to connect to an Azure Datalake via SDI, could you please provide the installation reference?

 

Kr,

Gilles (gilles.vandenbleeken@ontexglobal.com)
ugandhar_nandam1
Explorer
0 Kudos
Werner,

I'm trying to create remote source using the adapter provided by you but facing the below issue.

Can you please help?


Also, can you tell me what is the purpose of Data Lake API in HANA SDI mentioned in the document. If required, can you let me know how to install it in HANA.


Regards,

Ugandhar
werner_daehn
Active Contributor
0 Kudos
The Azure Data Lake API is just the Microsoft API this adapter is using to access the Azure Blob Storage. Microsoft provides multiple APIs for that and which to use is our choice. So nothing to install in Hana or the adapter.

 

The error message says that you can login to the Data Lake Endpoint, only there is no "filetest" folder within that end point. Can you post a screenshot what objects are in this filesystem?
ugandhar_nandam1
Explorer
0 Kudos

Thank you Werner for the update. I've created everything from scratch again and remote source is created successfully. But I'm not able to read data from json file.

Followed all the steps but could not get data, please let me what was issue.

Created Event Subscriptions.

Created .view file in the Dictionary Directory as well.

Screenshots below for reference.

werner_daehn
Active Contributor
0 Kudos

Can't see the problem based on the information you provided either. What I just did was:

  1. Create a remote source with the root directory "main".
  2. Placed the sampleData.json Ffile there with similar contents like you. An array of records with guid, balance and name
  3. Created the view file with the same settings (first three fields only)
  4. Imported the view file as virtual table
  5. executed a select * from that virtual table
  6. Got all three records of the json file

The view looks fine. Filter is okay, accessor is okay, balance is a number but defined as a VARCHAR, so it will get converted. No problem either. The queue does not play a role here yet, as you read from the virtual table only.

Yet you get zero record and no error. So either the sampleData.json file is empty or the remote source does not point to that directory.

btw, can we discuss this in a github issue? Otherwise we clob this post too much.

If nothing helps, let's have an online webmeeting.

ugandhar_nandam1
Explorer
0 Kudos
Happy to connect.

Please let me know when we can connect.
0 Kudos

Hi Werner,

I have one customer asking for the same use case. Could you please grant me access to your github for azuredatalakeadapter - prathyusha.garimella@sap.com. I can do quick POC. Does it have any limitations with regards to volume or data type ?

Thanks,

Prathy

sar-1
Discoverer
0 Kudos

Hi wdaehn,

Thank you for the nice blog. Could you please provide us the accesses to github repo. we have an use case to connect Azuredatalake using SDI.

werner_daehn
Active Contributor
0 Kudos
Please send me an email with your github username and I will add you to the repo.
sar-1
Discoverer
0 Kudos

Hi wdaehn,

Thanks you for the prompt reply ! Please find my details below

Email: XXXX

Github user id : XXX

sar-1
Discoverer
0 Kudos
Thanks you much ! deployed succssfully !
former_member601205
Discoverer
0 Kudos
Hi wdaehn

 

we have a use case can you please share the azuredata lake adpter repo access. My Mail:mukkamala2004@yahoo.com
mb_bauer
Explorer
0 Kudos
Hi Werner,

 

any update regarding your suggestion?

Thanks!
werner_daehn
Active Contributor
0 Kudos
I have suggested and discussed it with the developers last year - no response.
mb_bauer
Explorer
0 Kudos
There is a Docker image for DP agent: https://github.com/entmike/hana-dpagent,
but it would be nice to have an official one.
Maybe I'll file an idea at influence.sap.com.

I'm thinking about migrating our DP agents to a PaaS service like RedHat OpenShift.
Do you have any experience/feedback regarding such a setup?
werner_daehn
Active Contributor
The docker image is nice but actually just another way of installing the full blown dpagent. My proposal goes much further than that:

  • Remove all the extra software being installed like Data Services Engine. It is not needed to run e.g. an Oracle adapter.

  • Provide an Web UI to configure all instead of command line tools.

  • Monitoring and support via the Web UI.

  • One docker = one adapter = one agent to make everything very small and simple.

  • Remove the old communication patterns, JDBC method is a superset of all.

amrithsap22
Explorer
0 Kudos
Hi wdaehn,

 

Thank you for this nice and helpful blog. We have a requirement to setup integration between BW4Hana and data lake with read and write back to database. I feel this will help perform the integration, Kindly share the access and provide any document reference. Also, I would like to understand the approximate effort (In terms of number of hours) needed to complete this setup?

Thanks and Regards,

Amrith Kumar V M
amrithsap22
Explorer
0 Kudos
wdaehn - Could you please grant me access to your github for azuredatalakeadapter - my email id: amrithsap22@gmail.com
robert_camangon
Explorer
0 Kudos
Great blog Werner. I could not find the adapter though in SAP SDI or on any SAP KBAs. Would you be able to help send some docs on how to create this adapter for Data Lake? Thanks a lot.

 

Robert Camangon

robert.camangon@ghd.com
werner_daehn
Active Contributor
werner_daehn
Active Contributor
robert_camangon
Explorer
0 Kudos
Thanks Werner. Just a question, do you know if it is possible to deploy custom adapter from SQL Console and if you know what the command is? Asking since we have HA in place for Staging and Prod and we are only able to register the agent and adapters using SQL console since we are getting error "Received fatal alert: unrecognized name" when doing via the agentcli.bat hence SAP suggested just registering using SQL console and that worked there.

 

Thanks a lot.

Robert
werner_daehn
Active Contributor
0 Kudos

Yes, the config tools interact with the Hana database via SQL only also.

https://blogs.sap.com/2015/08/20/hana-adapter-sdk-interaction-via-sql/

yasser_aoujil
Explorer
0 Kudos
Hello Werner Dähn,

We also need the azure datalake adapter, can you please give us access to the repository.

This is the email: eusebio.hernandez@cemosa.es and our GitHub user is CEMOSA-DES

Many thanks and regards.

 

Yasser.
0 Kudos

Hello Werner,

Thanks for this post.

Il would need the azure datalake adapter, can you please give me access to the repository?

Email: jean-guillaume.kramarz@pasapas.com

Regards,

Jean-Guillaume

Labels in this area