On February 13th, 2025, we introduced SAP Business Data Cloud (BDC) to help enterprises unlock value from their data, using it to power analytics, AI, and automation. Fundamental to BDC are Data Products: curated datasets that are accessible, self-describing, discoverable, and understandable. The value and usability of any data partly depends on the availability and quality of metadata, which is also true for the Data Products of BDC. This blog post provides a deep dive into the metadata for SAP Data Products.
In summary, SAP uses open metadata standards to describe Data Products. The API / data model of Data Products is described in Core Schema Notation Interoperability Specification (CSN Interop). The high-level information about Data Products, their taxonomy and relationships are described in Open Resource Discovery with a focus on taxonomy and connecting Data Products with development, integration, and business-oriented concepts.
SAP developed a metadata discovery standard for describing and publishing various types of APIs and events: Open Resource Discovery (ORD). ORD is part of the Apeiro Reference Architecture and is contributed as open source to the European IPCEI-CIS initiative as a standard for interoperability. It was a natural step to leverage this existing standard rather than to create another isolated metadata standard that only covers the Data Product and platform aspects. So, we extended ORD to include the Data Product concept and connected it with other existing developer-focused concepts.
This also helps with avoiding the typical problem of siloed and fragmented metadata graphs. We believe there is huge value in connecting the various metadata concepts that are relevant to developers, architects, data scientists, and business persona. Our strategy is to bring all these aspects together and aggregate the metadata into central repositories.
In our model, the Data Product is a grouping concept that carries the relevant metadata to describe the dataset and its data qualities. A Data Product itself is not directly consumable, but it exposes data through output ports, which can be implemented as APIs or events. In the first iteration of BDC, the primary Data Products expose a single output port as delta-sharing API. In principle, there can be multiple output ports of various API and/or event types. To understand data ingestion, lineage, and dependencies, Data Products can also have input ports which are the equivalent of an ORD Integration Dependency that lists the APIs and events that are used to ingest or process the data. Most of the concepts have relations to Entity Type, which represents business objects / domain objects / ODM entities as semantic taxonomy terms business experts would understand.
Right now, SAP uses the ORD protocol to streamline our own discovery and publishing processes, most notably to the SAP Business Accelerator Hub, the SAP Business Technology Platform (BTP) System Landscape, SAP Build and to the BDC Cockpit. This all happens behind the scenes, so customers are not exposed to it and can benefit from a better integrated experience. However, in our partnership with Databricks, we also use the ORD protocol to connect the Databricks Unity Catalog with the Data Products provided by BDC as well as the other way around, retrieving Data Products created in Databricks into BDC. Leveraging ORD as an open metadata standard is a very important part of our partnership. We are looking forward to growing the ecosystem around ORD in our partnerships.
For more information about Data Products in ORD, have a look at the ORD Data Product page.
Data Products exposed by SAP applications will use the industry-standard Delta Sharing protocol, but customers will also be able to expose Data Products via other types of APIs. In Delta Sharing, the Delta Table metadata format provides the basics for wider ecosystem interoperability. For many of our use cases, we need even richer metadata – more semantics and a model that is closer to our existing metadata models.
Many SAP products already support the SAP-specific Core Data Services (CDS) metamodel, which may also be known to our customers as the metadata format for ABAP (RAP) and SAP Cloud Application Programming Model (CAP). Since we invested in rich metadata in CDS models for many years (sometimes decades), it made sense to leverage it also for Data Products, preserving and transferring the rich metadata and semantics we often already have.
As CDS models can be exported in the CSN format in many variants and flavors, we created a public specification that focuses on an interoperable, machine-consumption-optimized format: Core Schema Notation Interoperability Specification (CSN Interop).
With CSN Interop, we describe the API / data model of the Data Products output port. The most relevant aspects are that we describe entities (tables / objects) and elements (fields, properties). By assigning entities to a service, we define (or imply) the protocol and mapping how they are exposed as an API with a concrete data serialization format.
Since the CSN format is not well known outside of the SAP ecosystem, here are some interesting aspects:
CSN is well suited to describe “conceptual” models, which are more abstract than, e.g., JSON Schema. The same CSN model can be exposed through different API protocols and data serialization formats (via assignment to a Service). It can also be used as a persistency model for the database. The core entity model in CSN is therefore agnostic of protocol and serialization format. The final API / data interface is then decided by the chosen mapping. In the case of the Delta Sharing APIs, we defined a mapping to the Apache Spark type system.
Also, CSN already has extensive annotation vocabularies, which can carry a lot of additional information like business semantics (@Semantics, @ODM), relationships, and metadata that improve the semantic onboarding experience. This means that there is no need to rebuild a rich, semantic metamodel from scratch, because we can carry over the information.
For more information, have a look at the CSN Interop page and its Primer.
We plan to extend the current scope of the metadata standards and further improve the data product metadata with more annotations and better coverage. We want to establish ORD as a de facto industry standard, being an intrinsic piece of our open ecosystem approach similar as we use ORD as the solid foundation of our partnership with Databricks.
If you have questions, feedback or just want to get in touch with us, feel free to reach out!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 49 | |
| 49 | |
| 29 | |
| 23 | |
| 21 | |
| 15 | |
| 14 | |
| 13 | |
| 13 | |
| 12 |