This was a SAP User Group webcast. Abstract is shown below:
Big data scenarios are critical to future data warehouse architectures and require special integration scenarios between the data warehouse and the data lake. This session provides an overview of about the SAP Data Hub and illustrates how SAP BW/4HANA can be used to tightly integrate data and data flows, thus enabling completely new analysis scenarios. (Source: SAP)
Source: SAP
Source: SAP
Things in future are subject to change
Source: SAP
Different data silos
Data not connected
Build apps over different areas; how do this?
Challenge - how bring together
Data scientists have their own toolset, decoupled from SAP S/4HANA world
Operations are tough
Execute "on the go"
Needs for governance
Lack of security visibility
Source: SAP
Tower over airport; planes coming, going
Not persist data in Data Hub
Source: SAP
Take SAP data and combine with non-SAP data
ABAP integration - could be cloud, on-prem (RFC)
Cloud Data Integration - Fieldglass, Ariba to communicate
Orchestration - workflow - trigger BW process chain in Data Hub
SAP HANA integration - database & platform
On right side, connectors to external world
3 buckets of Data Hub -
1. Data Governance
Metadata is only data persisted in Data Hub; build catalog to reuse in applications
2. Data Orchestration & Monitoring
Connect with different systems, define schedule, workflows, remote orchestration of BW processes
3. Data Pipeline and Processing
Data Hub modeler, end to end process
Foundation is distributed run time, in Docker containers to scale up on the fly
With this build own data driven applications
Product Insight
Source: SAP
Lifecycle management, applications for execution, application data and data management, bring data to catalog to work with it, bring in embedded database (SAP Vora)
Connectivity on the right
Source: SAP
Always need a Kubernetes cluster
Private cloud - Virtustream
Managed cloud - license with SAP, install Data Hub, you manage it
Full Service - SAP Data Intelligence - full managed subscription - for data scientist, full lifecycle, announced at SAPPHIRENOW, learn more at SAP TechEd
Source: SAP
Entry is launchpad
Source: SAP
Center is metadata explorer
Source: SAP
Middle is the Pipeline, based on operators
Source: SAP
Read from HDFS, files need, pythonoperator to execute sentiment, write to Vora in warm storage
Operator is extendable, can be enhanced, can write your own operators
Source: SAP
Use cases, including IoT and orchestration
Bottom right is data cataloging to get full picture of metadata
Intelligent data warehouse - use data hub to connect on siloed big data world and bring together
Source: SAP
Discover siloed data sources
Bring structured and unstructured data together
Govern metadata catalog, data lineage
Orchestrate large data sets
Source: SAP
Customer in ERP world, consume social media channels, use for target marketing, move to HDFS cluster, format it in level that, anonymize, then brought Vora (use Pipeline on Data Hub)
Combine with BW/4HANA - tightly integrated with SAP Vora, brought ERP data in BW/4HANA, merge in BW
Visualize the result
Source: SAP
Pipeline example
Data stored in S3, merge with data in BW
Data Hub is the "glue" between systems
Source: SAP
Architecture - use Data Hub design time, create workflow that uses data transfer operator, then select source (Query, Dataset) and select target (such has HDFS)
At run time, data transfer task from BW, to flow agent writer to send it to HDFS
Can use INa (small amount of data, use on all BW systems)
With BW/4HANA - can go direct, via the calculation view in HANA - 10x faster
Question & Answer
Q: The license metric of Data Hub is Data Hub Units. Should HANA be additionally licensed, too?
A: No, not mandatory
Q: What is advantage of SAP DATA Hub over the combination of (SAP BW4HANA + SAP HANA VORA) ? Or does the Customer needs to go with SAP Data Hub to use HANA VORA?
A: No product called SAP HANA Vora; SAP Vora is embedded in Data Hub - no SAP Vora separately any more
Q: Does the customer need to get a separate BW box For using SAP Data Hub? In case the customer has only a S4HANA system with embedded analytics, can it implement SAP Data Hub?
A: Does not know if it will go against embedded analytics, looking at ODP transfer
Looking at ABAP CDS Reader for Data Hub later this year
Recording/Slide Links:
Recording