End of 2017, we have delivered SAP Data Hub, developer edition. This week we updated it to our newest release. And here is
SAP Data Hub, developer edition 2.4.
SAP Data Hub is a data sharing, pipelining, and orchestration solution that helps companies accelerate and expand the flow of data across their modern, diverse data landscapes (for more details take a look at Marc’s excellent
FAQ blog post).
The architecture of SAP Data Hub leverages modern container technology and, simply spoken, looks like this:
The main (technical) components of SAP Data Hub are:
- SAP Data Hub Foundation (mandatory component, installed on Kubernetes)
- SAP Data Hub Spark Extensions (optional component, installed on Hadoop)
SAP Data Hub, developer edition
For the developer edition, we have been looking for a way to run SAP Data Hub on your local computer. We took the parts of SAP Data Hub, which are in our opinion of most interest for developers and packaged them together with HDFS, Spark and Livy into
a single Docker container image. This container image can be used with different start options. Depending on the start option, it either runs SAP Vora Database, SAP Vora Tools, SAP Data Hub Modeler or HDFS, Spark, Livy (which are required for some example pipelines and tutorials).
Now, what are the
advantages of this approach?
- You can easily run SAP Data Hub, developer edition on your local computer (be it Windows, Linux or MacOS).
- Building the container image locally typically takes a few minutes. During this time, you need a stable internet connection. Once the container image is built, you can start a container based on the container image within less than a minute and without network connectivity.
- You can build powerful data pipelines (and they can interact with all kind of other technologies, e.g. SAP HANA, SAP API Business Hub, Kafka, any web service).
Of course, there are also some
drawbacks:
- The SAP Data Hub, developer edition currently does not allow you to use data governance and workflow features of SAP Data Hub.
- Unfortunately, you cannot observe how the SAP Data Hub usually containerizes and deploys data-driven applications onto Kubernetes.
- Some of the data pipeline operators (i.e., the re-useable and configurable components which you can combine to build data pipelines) will not work inside the container. Most notably, the operators related to machine learning (leveraging TensorFlow) and image processing (leveraging OpenCV) currently cannot be used, at least not "out-of-the-box".
How to get started?
To give the SAP Data Hub, developer edition a try, visit our
Tutorial Navigator. Currently the following
tutorials are available:
- Install and explore SAP Data Hub, developer edition
- Build your first pipeline with SAP Data Hub, developer edition
The tutorials give you a first idea how to build data-driven applications with SAP Data Hub. You will learn how to create your first pipeline. You will use a message broker, HDFS as well as SAP Vora.
If you have questions, problems or proposals in the meantime, feel free to post them as comments to this blog, or to the SAP Community. We will try to answer them in a timely manner and collect frequently asked questions
here.