Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Sangeetha_K
Product and Topic Expert
Product and Topic Expert
13,123

For many companies, data strategy may involve storing business data in independent silos at different repositories.  Some of that data may even cross different cloud sources (for cost and other reasons) which brings along new challenges with data fragmentation, data duplication and loss of data context.  SAP Datasphere helps bridge siloed and cross cloud SAP and non-SAP data sources enabling businesses to get richer business insights, all while keeping the data at its original location and eliminating the need to duplicate data and time consuming ETLs.  

Databricks Lakehouse is a popular cloud data platform that is used for housing business, operational, and historical data in its delta lakes and data lake houses. 

In this blog, let’s see how to do unified analytics on SAP Analytics Cloud by creating unified business models that combine federated non-SAP data from Databricks with SAP business data to derive real-time business insights.  

SAP and Databricks Integration Reference Architecture 

databricks ARD.jpgThe integration of Databricks and SAP BTP(Business Technology Platform) can be summarized in five simple steps: 

Step1: Identify the source delta lake data in Databricks: 

Step2: Prepare to connect Databricks to SAP Datasphere. 

Step3: Connect Databricks as a source in SAP Datasphere connections. 

Step4: Create Analytical dataset in SAP Datasphere to join live SAP and non-SAP(Databricks) data into one unified semantic model. 

STEP 5: Connect to this Analytical unified data model live from SAP Analytics Cloud and create visualizations that help illustrate quick business insights.  
 

The Details 


STEP1: Identify the source delta lake data in Databricks.

    1. For this blog, we will federate IoT data from Databricks delta lake and combine it with product master data from SAP sources.  

Pic: IoT Data in Databricks

Pic: Customer Master Data


 STEP 2: Prepare to connect Databricks to SAP Datasphere. 

    1. Go to your Databricks SQL Warehouse, Connection details tab as shown below and copy the jdbc url. 


Pic: JDBC Connectivity info from Databricks

 
2. Go to User settings-->Generate New Token, Copy & note the token. 

Note: Below basic connection uses user-based personal token for the auth mechanism. For production scenarios, auth mechanisms such as Service principal based token, or oAuth2.0 are suggested. 

 

Sangeetha_K_0-1730131884566.png

3. Rewrite the above JDBC string that we got in Step1, removing the uid and PWD parameters and adding the 2 new as shown below (IgnoreTransactions and UseNativeQuery) 

 

 

 

jdbc:databricks://adb-<id>.19.azuredatabricks.net:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/<id>;IgnoreTransactions=1;UseNativeQuery=0;UserAgentEntry=sap+datasphere

 

 

NOTE:  While the above is for classic hive meta store based schemas and tables, if you have Unity Catalog enabled megastore , you will have to slightly adapt the string above to add the ConnCatalog parameter at the end: 

 

 

 

jdbc:databricks://adb-<id>.19.azuredatabricks.net:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/<id>;IgnoreTransactions=1;UseNativeQuery=0;UserAgentEntry=sap+datasphere;ConnCatalog=<Catalog Name>

 

 

 

 

STEP 3 : Connect Databricks as a source in SAP Datasphere: 

Pre-Requisites: Data Provisioning Agent is installed and connected to SAP Datasphere. Make sure the DP Agent system can talk to the Databricks cluster. 

  1. Download the latest Databricks jdbc driver copied to camel/lib directory .  
  2. Edit the  <DPAgent_root>/camel/configfile-jdbc.properties file and add the line: delimident=BACKTICK
  3. Restart the DP agent.  
  4. Make sure CamelJDBCAdapter is registered and turned on in SAP Datasphere by following this help. 
  5. In SAP Datasphere Connections create a Generic JDBC connection and enter the details as shown below filling in the jdbc url we formed earlier.

Username : token   (type the word "token" as is) 
Password:  <use the token value we copied earlier from databricks user settings> 

Sangeetha_K_1-1730131956216.png

 

Sangeetha_K_3-1730132081690.png

Pic: SAP Datasphere Generic JDBC Connection Dialog


5. Create a remote table in SAP Datasphere databuilder for a Databricks table and preview to check if data loads.  

Sangeetha_K_0-1730132930555.png

 Pic: Remote Table in SAP Datasphere showing the Databricks schema table.


STEP 4: Create Fact model in SAP Datasphere to join live SAP and non-SAP(Databricks) data into one unified semantic model .  

Sangeetha_K_9-1730132672047.png

You can see the live query push downs happening at the Databricks compute cluster from the Log4j logs when data is previewed in SAP Datasphere models.  

Sangeetha_K_10-1730132749677.png

STEP 5: Connect to this Analytical unified data model live from SAP Analytics Cloud and create visualizations that illustrate quick business insights.  

For example, the dashboard below shows real time truck and shipment status for customer shipments. The live IoT data from Databricks delta lake that holds the real-time truck data is federated and combined with customer and shipment master data from SAP systems into a unified model used for efficient and real-time analytics. 

 

Sangeetha_K_7-1730132485070.jpg

 

Pic: SAP Analytics Cloud Story Dashboard - Visualizing live data from Databricks


We hope this quick tutorial helps you in your data journeys and exploring the exciting new features available in SAP Datasphere.  We’d love to get your thoughts & opinions.  So please leave us a comment below.  And don’t forget to give us a like too if you found this blog especially useful!  Thanks for reading!   

Please read our next blog here to learn about how FedML-Databricks library can be used to federate live data from SAP Datasphere’s unified semantic data models for doing machine learning on Databricks platform. 

Credits

Many thanks to Databricks team for their support and collaboration in validating this architecture – Itai Weiss, Awez Syed,  Qi Su, Felix Mutzl and Catherine Fan. Thanks to SAP team members, for their contribution towards this architecture – Akash Amarendra, Karishma Kapur, Ran Bian, Sandesh Shinde, and to Sivakumar N and Anirban Majumdar for support and guidance.

For more information about this topic or to ask a question, please contact us at paa@sap.com  

3 Comments