Technology Blog Posts by SAP
cancel
Showing results for 
Search instead for 
Did you mean: 
Yogesh__Vijay
Product and Topic Expert
Product and Topic Expert
7,767

1. Keeping Data in Databricks

Scenario Some organizations have non-SAP data (e.g., from third-party apps, IoT sources, data lakes) that they want to retain in Databricks rather than moving everything into SAP’s environment. In the new SAP Business Data Cloud (BDC) architecture, SAP offers an integrated “SAP Databricks” service (an OEM version of Databricks) that seamlessly connects with BDC. This setup allows you to:

 

  • Store certain data sets entirely in Databricks’ object store (Delta Lake).
  • Leverage Databricks’ advanced AI/ML capabilities for those data sets.
  • Federate or share data back to SAP Business Data Cloud without physically copying it multiple times.

 


2. Where the Data Actually Resides

In SAP Business Data Cloud, SAP data is stored in HANA Cloud Data Lake files (an object store) and managed via foundation services. For Databricks, the data typically resides in:

 

  • Databricks’ Object Store: This could be AWS S3, Azure Data Lake Storage, or Google Cloud Storage, depending on your chosen cloud provider.
  • Delta Lake Format: Databricks natively uses Delta Lake for structured storage, enabling versioning, ACID transactions, and efficient reads.

 

“Zero-Copy” Sharing

A key feature in the BDC–Databricks partnership is Delta Sharing or “zero-copy sharing,” which allows you to provide read access to data without physically replicating it. So if a data set is stored in Databricks, you can make it visible to SAP Business Data Cloud analytics or AI use cases, and vice versa, without having to manage multiple data copies.


3. Integration Tools for Non-SAP Sources

When you have non-SAP data that needs to be ingested into Databricks, you can use:

 

  1. Databricks’ Native Ingestion Capabilities
  2. ETL/ELT Tools & Connectors
  3. Cloud Provider Services

 


Putting It All Together

 

  • SAP Business Data Cloud primarily handles SAP application data and makes it available in a curated, semantically rich “data product” format.
  • Databricks (the OEM “SAP Databricks” version) can be your go-to environment for large-scale data engineering, advanced ML/AI, or non-SAP data sets you want to keep outside of HANA Cloud.
  • Integration between the two is seamless. You can keep some data sets exclusively in Databricks while still exposing them to SAP analytics and AI scenarios through zero-copy sharing or standardized pipelines.

 

In short, if your architecture calls for certain data sets to remain in Databricks, you can do so without losing out on SAP’s built-in analytics, AI, and business semantic features. Data is stored in Delta Lake within Databricks, and you can bring in non-SAP data using any Databricks-supported ingestion method or standard ETL/ELT tools. Once in Databricks, that data can still be surfaced in SAP Business Data Cloud for unified insights, all while retaining the power of Databricks for big data processing and machine learning.

2 Comments