For many companies, data strategy may involve storing business data in independent silos at different repositories. Some of that data may even cross different cloud sources (for cost and other reasons) which brings along new challenges with data fragmentation, data duplication and loss of data context. SAP Datasphere helps bridge siloed and cross cloud SAP and non-SAP data sources enabling businesses to get richer business insights, all while keeping the data at its original location and eliminating the need to duplicate data and time consuming ETLs.
Databricks Lakehouse is a popular cloud data platform that is used for housing business, operational, and historical data in its delta lakes and data lake houses.
In this blog, let’s see how to do unified analytics on SAP Analytics Cloud by creating unified business models that combine federated non-SAP data from Databricks with SAP business data to derive real-time business insights.
The integration of Databricks and SAP BTP(Business Technology Platform) can be summarized in five simple steps:
Step1: Identify the source delta lake data in Databricks:
Step2: Prepare to connect Databricks to SAP Datasphere.
Step3: Connect Databricks as a source in SAP Datasphere connections.
Step4: Create Analytical dataset in SAP Datasphere to join live SAP and non-SAP(Databricks) data into one unified semantic model.
STEP 5: Connect to this Analytical unified data model live from SAP Analytics Cloud and create visualizations that help illustrate quick business insights.
STEP1: Identify the source delta lake data in Databricks.
Pic: IoT Data in Databricks
Pic: Customer Master Data
STEP 2: Prepare to connect Databricks to SAP Datasphere.
Pic: JDBC Connectivity info from Databricks
2. Go to User settings-->Generate New Token, Copy & note the token.
Note: Below basic connection uses user-based personal token for the auth mechanism. For production scenarios, auth mechanisms such as Service principal based token, or oAuth2.0 are suggested.
3. Rewrite the above JDBC string that we got in Step1, removing the uid and PWD parameters and adding the 2 new as shown below (IgnoreTransactions and UseNativeQuery)
jdbc:databricks://adb-<id>.19.azuredatabricks.net:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/<id>;IgnoreTransactions=1;UseNativeQuery=0;UserAgentEntry=sap+datasphere
NOTE: While the above is for classic hive meta store based schemas and tables, if you have Unity Catalog enabled megastore , you will have to slightly adapt the string above to add the ConnCatalog parameter at the end:
jdbc:databricks://adb-<id>.19.azuredatabricks.net:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/<id>;IgnoreTransactions=1;UseNativeQuery=0;UserAgentEntry=sap+datasphere;ConnCatalog=<Catalog Name>
STEP 3 : Connect Databricks as a source in SAP Datasphere:
Pre-Requisites: Data Provisioning Agent is installed and connected to SAP Datasphere. Make sure the DP Agent system can talk to the Databricks cluster.
Username : token (type the word "token" as is)
Password: <use the token value we copied earlier from databricks user settings>
Pic: SAP Datasphere Generic JDBC Connection Dialog
5. Create a remote table in SAP Datasphere databuilder for a Databricks table and preview to check if data loads.
Pic: Remote Table in SAP Datasphere showing the Databricks schema table.
STEP 4: Create Fact model in SAP Datasphere to join live SAP and non-SAP(Databricks) data into one unified semantic model .
You can see the live query push downs happening at the Databricks compute cluster from the Log4j logs when data is previewed in SAP Datasphere models.
STEP 5: Connect to this Analytical unified data model live from SAP Analytics Cloud and create visualizations that illustrate quick business insights.
For example, the dashboard below shows real time truck and shipment status for customer shipments. The live IoT data from Databricks delta lake that holds the real-time truck data is federated and combined with customer and shipment master data from SAP systems into a unified model used for efficient and real-time analytics.
Pic: SAP Analytics Cloud Story Dashboard - Visualizing live data from Databricks
We hope this quick tutorial helps you in your data journeys and exploring the exciting new features available in SAP Datasphere. We’d love to get your thoughts & opinions. So please leave us a comment below. And don’t forget to give us a like too if you found this blog especially useful! Thanks for reading!
Please read our next blog here to learn about how FedML-Databricks library can be used to federate live data from SAP Datasphere’s unified semantic data models for doing machine learning on Databricks platform.
Many thanks to Databricks team for their support and collaboration in validating this architecture – Itai Weiss, Awez Syed, Qi Su, Felix Mutzl and Catherine Fan. Thanks to SAP team members, for their contribution towards this architecture – Akash Amarendra, Karishma Kapur, Ran Bian, Sandesh Shinde, and to Sivakumar N and Anirban Majumdar for support and guidance.
For more information about this topic or to ask a question, please contact us at paa@sap.com
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
20 | |
16 | |
8 | |
8 | |
8 | |
8 | |
7 | |
6 | |
5 | |
5 |