To understand this blog, it is at least necessary to have read my first blog or another respective publication to be familiar with the four principles of Data Mesh and to understand why centralized monolithic data architectures suffer from some inherent problems.
SAP HANA Cloud - Build a trusted Data Foundation
SAP HANA Cloud is a single database as a service (DBaaS) foundation for modern applications and analytics across all enterprise data and the cloud-based data foundation for SAP Business Technology Platform. Several of its functionalities predestine SAP HANA Cloud to be used in Data Mesh scenarios. In the following, I will explain the three, from my point of view, most relevant:
Applying Domain-Driven Design strategies to data:
"Each domain can model its data according to its context, share this data and its models with others, and identify how one model can relate and map to others." This statement comes verbatim from the book "Data Mesh - Delivering Data-Driven Value at Scale" by Zhamak Deghani and expresses that the principles of Domain-Driven Design1) used in software development are now also pentrating the analytical data space. Going into more detail about this concept at this point would go beyond the scope of this article. If you want to know more about adapting it to the world of data and analytics I recommend to read this article from Martin Fowler.
At this point it is of higher importance to derive the requirements towards technology to enable the domains to model its data as efficiently as possible? A data and technology platform must provide multi-models in order to cover the complexity of todays reality and enable smart processing on top. Multi-model data platforms represent the intersection of various data models such as JSON documents, graph networks, and relational tables in a single data platform. With a multi-model database, domains can unify various data types and models into a single solution, without having individual technologies for each specific purpose.
With SAP HANA and SAP HANA Cloud, we are a long-term player in this domain offering a comprehensive solution, infused with additional features like geospatial analysis, enterprise search, machine learning and predictive modelling. SAP HANA Cloud seamlessly blends smart multi-model data to power intelligent data products.
Data virtualization enables domains to quickly implement data products by working with virtual models on any data source. Transparency across data sources increases usabiltiy and data virtualization minimizes data replication, which makes the creation of data products significantly easier than with traditional alternatives. Iterating over multiple data product versions to fully meet all business needs is becoming much faster. Developing intelligent data products with embedded machine-learning capabilities becomes quick work in SAP HANA Cloud.
SAP HANA Cloud virtualizes data from full landscapes, including on-premise and cloud applications and third-party sources. SAP HANA Cloud Virtualization and Replication transfers data from source systems in the form of virtual tables which point towards supported remote sources (see SAP Note 2600176). Real Time accessibility allows the domain teams to visualize operational data on the fly and to capture delta changes at the moment it is posted.
Real-Time Data Virtualization:
Federate on-premises and cloud data sources
Shifting left security and compliance governance:
Through the idea of shifting security and compliance governance to the left, Data Mesh makes data more secure and compliant at speed and scale. What exactly does this statement mean? Rather than using insecure data and trying to address compliance requirements after the product is ready, domain teams can address them before they begin to develop. Compared to traditional paradigms, the Data Mesh principle "Data as a Product" reverses the model of responsibility to a certain extent. Unlike being a task of the centralized infrastructure team, Data Mesh moves this responsibilities closer to the source of the data. Security and privacy, as well as governance, are shifted to the left so to speak to become part of the day-to-day work for decentralized domain teams.
This increases efficiency, since decentralized teams know their data best due to their deep domain knowledge and therefore also know exactly how to govern and protect it. SAP HANA Cloud provides these teams with state-of-the-art methods such as k-anonymity, l-diversity and differential privacy to fully anonymize and effectively protect personally identifiable information. These procedures are summarized as "Real-Time Data Anonymization" because they anonymize data in motion leaving the sources untouched. This means that different anonymization methods and strengths can be applied to different attributes, which makes it much easier to supply the domain teams with suitable, trustworthy data.
The anonymization is implemented in views over the real data, so that the combinatorics and nesting depth can be determined case by case. By creating anonymization reports, Chief Data Protection Officers can control at any time whether company's compliance requirements are being met.
Real-Time Data Anonymization:
Innovate with confidence
The author would like to thank volker.haentjes for the collaboration on this topic and his contributions to this article.
1) Eric Evans, Domain-Driven Design: Tackling Complexity in the Heart of Software, (Upper Saddle River, NJ: Addison-Wesley, 2003).