We are thrilled to announce that the latest release of SAP Data Hub, version 2.3, is now available. This release has a fresh new look and an easy to follow user interface. Now, you can rely on SAP Data Hub to help you manage metadata assets, enable governance, and accelerate data driven processes across the entire landscape in a more efficient manner.
All components in this version are fully containerized, which means the underlying components such as Engines, Agents, Metadata Store in SAP HANA are all running in an isolated execution environment within Kubernetes. As such, it tremendously simplifies the installation process, and reduces the deployment time.
Now, let’s go through the key features and explain how they benefit you.
- Single entry point for all SAP Data Hub applications
Introducing SAP Data Hub Launchpad, a modern-looking user interface designed to provide you a single point of access to all user-facing applications including System Management, Monitoring, SAP Vora Tools, Connection Management, Metadata Explorer and a Pipeline Modeler. In addition there is a system management with lifecycle and repository capabilites introduced.
- Simplified deployment of SAP Data Hub in Clouds and On-premise environments
Having a fully containerized architecture allows you to deploy SAP Data Hub on any platform that supports Kubernetes. This includes managed cloud services: AWS (EKS), GCP (GKE), Azure (AKS), private cloud or on-premise installations like Suse CaaSP. Furthermore, we join forces with Cisco to provide a turn-key enterprise-scale solution that fosters a seamless interplay of powerful hardware and sophisticated software. With the Cisco Container Platform running on its hyper converged hardware solution Cisco Hyperflex, Cisco provides an elastically scalable container cluster with upstream Kubernetes. We complemented it with a Scality Ring Object Store and AVI Networks Load Balancers to form the perfect foundation for running SAP Data Hub on-premise on enterprise-scale hybrid cloud environments.
Starting with this release, all necessary components including SAP HANA and SAP Vora’s distributed runtime engines are delivered containerized via a Docker registry. This removes the need to install a separate SAP HANA instance for external storage or a Hadoop cluster for Vora's runtime executions.
The major advantage of following a fully containerized architecture is to enable the data processing layer to be separated by ideally co-located with the main data storage. By removing the requirement to install SAP HANA on a separate server, the installation process becomes much leaner and easier. All major cloud storage platforms, HDFS as well as on-premise file shares are fully supported.
- Meta Data catalog to Improve visibility about landscape-wide data assets
We are introducing SAP Data Hub Metadata Explorer with the goal to help govern and manage metadata assets that are spread across diverse systems and disparate sources.
Key functionalities include but are not limited to:
- Connect to data sources with the ability to automatically crawl their meta data structures
- Create references on data (so called data sets) and store them in the Metadata Catalog
- Search and browse from the Metadata Catalog to find relevant data assets
- Discover and profile data within the landscape to get insights on the data quality
- Out of the box support for SAP HANA, SAP Vora, Object Stores (S3, GCS, etc.), HDFS, SAP BW, Oracle
With these new features available in SAP Data Hub 2.3, it is now easier for you to manage metadata and enable data-driven processes within distributed landscapes. SAP Data Hub provides an easy way for all data professionals including Data Designers, Scientists, Engineers, Architects, and Modelers to get insights regardless of where the data is stored (Data Warehouse, Data Lake or Cloud storage, etc.).
- Enhanced Data Integration & Connectivity capabilities
SAP Data Hub provides a broad spectrum of connectivity with a strong focus on “Big Data” components (e.g. Hadoop, Cloud Stores, Machine Learning Services and real-time messaging technologies). As the product is continuously evolving and adopted by a broader customer base, we understand the need of having native connectivity to a wide range of databases and enterprise applications. We are introducing a new common connectivity framework which serves as the underlying infrastructure, with the goal to rapidly expand and enhance the native connectivity and integration functionalities especially tailored for structured data sources.
Among many others, SAP Data Hub provides native connectivity to the following sources:
- Relational databases (Oracle, etc.) and enterprise applications
(e.g. SAP S/4HANA, SAP BW/4 HANA)
- Popular cloud storage platforms such as WASB, S3, and GCS
- Open protocols such as OData and OpenAPI
- Cleansing and enrichment services via integration of SAP Data Quality Management microservices (DQMm) for location data
- Machine Learning Services like SAP Machine Learning Foundation Services
- 3rd Party Services and technologies like Spark, Livy and Google Pub/Sub
Below is a snapshot of the operators that provides native connectivity:
Furthermore, there are improvements in optimization for ingesting data streams into SAP Vora and persistence settings:
- provides native streaming capability into SAP Vora persistent storage
- supports for DML operations (Insert, Update, Delete, Upsert) on streaming tables with the disk engine
- support for external cloud storages as back up checkpoints
- supports real-time data replication into SAP Vora tables directly with SAP LT Replication Server
- Unified modeling interface with SAP Data Hub Modeler
Finally, we unified and improved the user experience. In this release, all existing modeling capabilities are now unified into a single interface, the SAP Data Hub Modeler. The below data operations are now delivered as dedicated operators which are ready to be used in any data pipeline:
- Workflow Pipeline Operators (Data Transfers (HANA/BW), Spark Jobs, etc.)
- Remote Sources Orchestration (SAP BW Process Chain, SAP Data Services Job, SAP HANA Flowgraph)
- Structured Data Transformations (projection, aggregation, join, union, case)
- Data Masking (mask out, numeric generalization, pattern variance, etc.)
- Validation Rules (basic and custom functions)
In summary, SAP Data Hub 2.3 offers more functionality with a flexible architecture, which ultimately simplifies and speeds up the process of deployment, scaling the data pipelines, and enforcing governance.
To learn more about the product, please refer to the
official documentation site, or check out the SAP Data Hub Youtube channel for more tutorial videos. For hands-on experience with SAP Data Hub, please visit the following assets:
- SAP Data Hub, Developer Edition
https://blogs.sap.com/2017/12/06/sap-data-hub-developer-edition/
- SAP Data Hub, trial edition
https://blogs.sap.com/2018/04/26/sap-data-hub-trial-edition/
- Sessions / Hands-On Workshops @ SAP TechEd
https://blogs.sap.com/2018/09/18/sap-data-hub-at-teched-las-vegas-barcelona-and-bangalore-2018/