This tutorial explains how to develop a custom Pipeline Operator in the SAP Data Hub Pipeline Modeler by extending a predefined Base Operator.
This is the first article of a series of tutorials:
Prerequisites
Before you start with this series of tutorials, please make sure that:
- You have access to the SAP Data Hub Pipeline Modeler which is part of SAP Data Hub 1.2. In case you don't have an SAP Data Hub installation available, you can also use the SAP Data Hub Developer Edition which allows testing data pipeline scenarios on your own desktop free of charge.
Getting Started
The Pipeline Modeler
The
SAP Data Hub Pipeline Modeler tool is based on a pipeline engine that uses a flow-based programming paradigm to create data processing pipelines which are modeled as computation graphs and executed in a containerized environment that runs on
Kubernetes.
Please note, that the SAP Data Hub Developer Edition in the current version runs and executes pipelines in a single Docker image/container and therefore does not require a Kubernetes installation.
What are Pipelines (Graphs)?
A
Pipeline or graph is a network of operators that are connected to each other using typed input ports and output ports for exchanging data. The pipelines (graphs) and the operators are stored in a folder structure within the SAP Vora Repository. The repository content, as well as the available pipelines (graphs), can be accessed within the SAP Data Hub Pipeline Modeler by selecting the resp. tabs in the navigation pane:
When you click on a specific pipeline (graph) in the navigation pane, the tool opens the
Pipeline Editor from where you can examine, modify and execute the pipeline:
The example above shows a simple pipeline that generates random data which is being stored in binary format in HDFS (Hadoop Distributed File System). The second part reads the same data again from HDFS and prints the content to a browser terminal.
What are Operators?
An
Operator represents a vertex of a graph and reacts to events from the environment. An event from the environment is a message delivered to the operator through its
Input ports. The operator can interact with the environment through its
Output ports. The following image shows a sample operator along with the input ports and output ports. Each port is associated with a
Port Type, and the tool uses color codes to identify compatible port types:
Operators require a certain runtime environment for their execution. For example, if an operator executes some JavaScript code, it requires an environment with a JavaScript engine. The SAP Data Hub Data Pipelines tool provides certain predefined environments for operators, and these environments are made available to users as a library of
Docker files.
When you execute a pipeline (graph), the tool translates each operator in the graph into processes. It then searches the Docker files for an environment suitable for the operator execution and instantiates a
Docker image. The Docker image with the environment and the operator process is executed on a
Kubernetes cluster.
The pre-shipped operators can be parameterized, some of the operators can be supplied with custom scripts, allowing to implement a variety of different scenarios without the need of doing lower level programming.
Operator Extensibility Concept
There are uses cases where either the existing operators are not sufficient to implement a certain scenario or where customized operators shall be made available for re-use in other pipelines. To support such use cases, the SAP Data Hub Pipeline tool provides an advanced extensibility option for operators which allows for both, the creation of new operators, and wrapping virtually any code into operators:
- Custom operators can be defined by deriving from existing Base Operators provided by the engine itself. For example, a JavaScript, Golang or Python operator can be implemented with a custom script and enriched with additional parameters and metadata. This is sufficient as long as the environment provided by the base operator is sufficient.
- Additional libraries, specific environments or logic written in other programming languages can be integrated into the SAP Data Hub by defining own Docker files that contain all the commands to assemble a Docker image. The custom Docker images provide the environment in which your operators are executed by tagging the Docker files and operators accordingly.
This tutorial explains how to create a derived operator (1) without the definition of an own Docker file. The creation and usage of own Docker files in operators will be explained in an upcoming tutorial of this series.
Create a Weather Sensor Operator
As an example, we create a weather sensor simulator which repeatedly sends measures like temperature and humidity. The operator extends the pre-shipped JavaScript operator and gets along with the provided environment of this operator.
1. Create a Folder Structure
As a first step, we create an own folder structure for our virtual company “acme” and subsequentially, we create our weather sensor operator in a subfolder of this folder:
- Start the SAP Data Hub Pipeline Modeler in a web browser.
- In the navigation pane on the left side, choose the Repository tab:
The tool displays all
Graphs,
Operators, and
Docker Files that are available in your Repository.
- Create a folder in the Repository by right-clicking the Operators section and choose Create Folder to create a new folder in which you later create your operator:
- Provide the name “acme” for the root folder and choose OK:
- Right-click the folder “acme” and repeat the previous steps to create a subfolder called “generators”.
The resulting folder structure in the repository should now look as follows:
2. Create the Operator
Next, we create a custom
Operator in our folder.
- Right-click the folder “generators” and choose the Create Operator menu option:
- In the Name text field, provide the name “weather_sensor”.
- In the Display Name text field, provide a display name, e.g. “Weather Sensor” (The display name is considered when searching for operators)
- In the Base Operator dropdown list, select the “Javascript Operator”:
The tool opens the
Operator Editor window. The operator editor is a form-based editor, where you can define the details of your operator.
3. Define the Operator
The definition of an operator consists of metadata in JSON format comprised of an
ID and
Description,
Input Ports and
Output Ports, an
Operator Configuration and a required execution environment provided by a
Docker image.
3.1. Define the Input Ports and Output Ports:
- Our operator generates data but does not read any data from the input. Therefore, it does not require any Input Ports.
- In the Output Ports section, choose + and define a new output port with Name “output” and Type “string”:
The
Name is just required for assigning the port a meaningful description, while the
Type plays a crucial role when connecting the operator with other operators, i.e. only ports of the same
Type can be connected. Otherwise, the value of the
Type does not necessarily enforce a data type validation by the tool.
Please refer to the SAP Data Hub Developer Guide for a list of the currently supported
Port Types:
https://help.sap.com/viewer/29ff74dc606c41acad117003f6034ac7/1.2.latest/en-US/67899991ea294f5fa8967e...
3.2. (Optional) Define Tags:
Tags describe the runtime requirements of the operator and allow to force the execution in a specific Docker image instance whose
Docker file was annotated with the same
Tag and
Version. In cases, where multiple operators are grouped together, the tool chooses the Docker file that provides all tags that are set in the operators of the group.
- For our weather sensor operator, we can skip the Tags section as we just want to write JavaScript code which can run in the environment that was inherited from the Javascript Operator.
3.3. Define the Operator Configuration:
It is possible to define a
Operator Configuration Type which allows enabling semantic- and type-validation of the configuration parameters as well as the definition of UI conditions. For now, we skip this part and jump to the definition of the
Parameters.
- You can add different runtime Parameters to the Operator. There is already one parameter “codelanguage” that was inherited from the Javascript base operator. The parameter value is used by the UI for syntax highlighting in the Script Editor which is shown when editing the operator.
- We add one additional parameter “sendInterval” of type “String” with a default value “1000” that allows us to control in what interval (milliseconds) the measurements are being send:
- In the Script section, we add the following Javascript code which generates random values that shall represent the measurements of a weather station:
getRandom = function(min, max) {
return Math.random() * (max - min) + min;
};
getRandomInt = function(min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min;
};
getUUID = function() {
return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c) {
var r = Math.random() * 16 | 0, v = c == 'x' ? r : (r & 0x3 | 0x8);
return v.toString(16);
});
}
var counter = 0;
var sendInterval = $.config.getInt("sendInterval");
var deviceID = getUUID();
generateData = function() {
var payload = deviceID + ","; // DeviceID
payload += getRandom(-30, 50) + ","; // Temperatur
payload += getRandom(40, 100); // Humidity
return payload;
};
$.addTimer(sendInterval+"ms",doTick);
function doTick(ctx) {
$.output(counter+","+generateData());
counter++;
}
Under the hood of the Javascript Operator, there is a light-weight JavaScript parser and interpreter, which complies with ECMAScript 5. Integration functions, e.g. for reading the configuration parameters are available via the
$-object. Please refer to the documentation of the Javascript-Operator for a list of available functions and examples.
3.4. Modify the Operator Display Icon:
When you create an operator, the tool uses a default operator display icon. You can change the icon to other icons that come with the tool or upload your own icons in SVG (Scalable Vector Graphics) format.
- In the operator editor, click the operator's default icon:
- If you want to use any of the icons that the tool provides, select an Iconfrom the dropdown list. In our case, we choose the “cloud” icon:
- If you want to upload a SVG file, click the upload icon and browse to the wanted *.svg-file
The tool uses the new icon for the operator when it displays it in the graph editor.
3.5. Create or maintain Documentation for the Operator:
You can provide your own documentation for operators in
Markdown language. The documentation that you provide, for example, can include information of the operator, its configuration parameters, input ports, output ports, and more. Other users working with this operator can refer to this documentation in the
Documentation Tab of the operator.
- In the operator editor toolbar, click the documentation icon:
- Provide the required documentation. The documentation can be written in Markdown language, e.g.:
Weather Sensor
===========
This is an extension of the base operator com.sap.system.jsengine which generates random values that shall represent the signals of different weather stations sending data every second.
Configuration parameters
------------
* **numDevices** (type int): Number of weather sensor devices
Input
------------
* None
Output
------------
* **output** (type string): generated data
3.5. (Optional) Upload Auxiliary Files:
Uploading auxiliary files is useful when the operator needs to access or execute a file during runtime. It can be any kind of file. For example, binary executables, such as a JAR file that shall be executed when the pipeline executes the operator. In our case, we do not need any additional files and therefore skip this step.
3.6. Save the Operator:
- In the editor toolbar, click the Save-icon to save the operator:
Congratulations, you can now access the operator from the
Operators tab (under the
Others section) in the navigation pane, and use it for creating own pipelines:
In the next tutorial, you will learn how to use your custom operator in a pipeline: