Technology Blog Posts by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
architectSAP
SAP Mentor
SAP Mentor
4,190
Inspired by andreas.forster’s excellent blog SAP Data Intelligence: Create your first ML Scenario and encouraged by karim.mohraz’s incredibly helpful blog Train and Deploy a Tensorflow Pipeline in SAP Data Intelligence, in this blog I will combine both their approaches to demonstrate how to create an as plain vanilla as possible SAP Data Intelligence machine learning scenario with TensorFlow.

To start with, I create a Data Workspace and respective Data Collection in the SAP Data Intelligence ML Data Manager and upload Andreas’s training data there:


From my Jupiter Lab Data Manager, I get to my Data Collection and copy the code to load my training data:
import pandas as pd
import sapdi
ws = sapdi.get_workspace(name='architectSAP')
dc = ws.get_datacollection(name='architectSAP')
with dc.open('RunningTimes.csv').get_reader() as reader:
df = pd.read_csv(reader, sep=';')
df.head()

	ID	HALFMARATHON_MINUTES	MARATHON_MINUTES
0 1 73 149
1 2 74 154
2 3 78 158
3 4 73 165
4 5 74 172

On that basis, I build my data set:
x = df[['HALFMARATHON_MINUTES']]
y_true = df[['MARATHON_MINUTES']]
import tensorflow as tf
dataset = tf.data.Dataset.from_tensor_slices((x.values, y_true.values))
dataset = dataset.batch(1)
for feat, targ in dataset.take(5):
print('Features: {}, Target: {}'.format(feat, targ))
print(x.shape)
print(y_true.shape)

Features: [[73]], Target: [[149]]
Features: [[74]], Target: [[154]]
Features: [[78]], Target: [[158]]
Features: [[73]], Target: [[165]]
Features: [[74]], Target: [[172]]
(117, 1)
(117, 1)

To create, compile and train my model:
model = tf.keras.Sequential([tf.keras.layers.Dense(4, name='hidden', batch_size=1, input_shape=(1,)), tf.keras.layers.Dense(1, name='output')])
model.compile(tf.keras.optimizers.Adam(), tf.keras.losses.MeanSquaredError())
model.summary()
history = model.fit(dataset, epochs=8)

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
hidden (Dense) (1, 4) 8
_________________________________________________________________
output (Dense) (1, 1) 5
=================================================================
Total params: 13
Trainable params: 13
Non-trainable params: 0
_________________________________________________________________
Epoch 1/8
117/117 [==============================] - 0s 2ms/step - loss: 80740.5312
Epoch 2/8
117/117 [==============================] - 0s 3ms/step - loss: 57319.2930
Epoch 3/8
117/117 [==============================] - 0s 3ms/step - loss: 36689.4297
Epoch 4/8
117/117 [==============================] - 0s 3ms/step - loss: 18188.9629
Epoch 5/8
117/117 [==============================] - 0s 3ms/step - loss: 6463.0679
Epoch 6/8
117/117 [==============================] - 0s 3ms/step - loss: 1668.1143
Epoch 7/8
117/117 [==============================] - 0s 3ms/step - loss: 459.1681
Epoch 8/8
117/117 [==============================] - 0s 4ms/step - loss: 292.9158


With very similar results to Andreas’s of course (red) versus the MSE optimum (green), but this time leveraging TensorFlow Keras:
import matplotlib.pyplot as plot
import numpy as np
m, b = np.polyfit(np.squeeze(x), y_true, 1)
plot.scatter(x, y_true);
plot.plot(x, model.predict(x), color = 'red');
plot.plot(x, m*x + b, color = 'green');
plot.xlabel("Actual Minutes Half-Marathon");
plot.ylabel("Actual Minutes Marathon");


And check that there is no auto correlation, by scatter plotting the residuals to verify their randomness:
plot.scatter(x, y_true - model.predict(x), color="orange");
plot.xlabel("Actual Minutes Half-Marathon");
plot.ylabel("Residuals");


For leveraging these results, in the SAP Data Intelligence ML Scenario Manager, I add a Python Producer to create, compile, train and store my model. Since I want to stay as plain vanilla as possible, I only add a few lines to the template and stick with its naming conventions:
import pandas as pd
import tensorflow as tf
import io
import json
import h5py

# Example Python script to perform training on input data & generate Metrics & Model Blob
def on_input(data):
# to send metrics to the Submit Metrics operator, create a Python dictionary of key-value pairs
df = pd.read_csv(io.StringIO(data), sep=';')
x = df[['HALFMARATHON_MINUTES']]
y_true = df[['MARATHON_MINUTES']]
dataset = tf.data.Dataset.from_tensor_slices((x.values, y_true.values))
dataset = dataset.batch(1)
model = tf.keras.Sequential([tf.keras.layers.Dense(4, batch_size=1, input_shape=(1,)), tf.keras.layers.Dense(1)])
model.compile(tf.keras.optimizers.Adam(), tf.keras.losses.MeanSquaredError())
history = model.fit(dataset, epochs=8)
# metrics_dict = {"kpi1": "1"}
metrics_dict = json.dumps({'loss': str(history.history['loss'][len(history.history['loss']) - 1])})

# send the metrics to the output port - Submit Metrics operator will use this to persist the metrics
api.send("metrics", api.Message(metrics_dict))

# create & send the model blob to the output port - Artifact Producer operator will use this to persist the model and create an artifact ID
f = h5py.File('blob', driver='core', backing_store=False)
model.save(f)
f.flush()
# model_blob = bytes("example", 'utf-8')
model_blob = f.id.get_file_image()
api.send("modelBlob", model_blob)

api.set_port_callback("input", on_input)

Since I use the TensorFlow Python libraries, I need to add a Group with a tag to identify my Docker image:
FROM $com.sap.sles.ml.python
RUN python3.6 -m pip --no-cache-dir install --user --upgrade pip
RUN python3.6 -m pip --no-cache-dir install --user tensorflow


I then execute my Python Producer with these parameters after changing the Connection in the Read File Operator:



To obtain my Metrics, Models and Datasets:


To consume these, in the SAP Data Intelligence ML Scenario Manager, I add a Python Consumer. Since I still want to stay as plain vanilla as possible, I only add a few lines to the template to apply my model and obtain my results:
# apply your model
blob = io.BytesIO(model)
f = h5py.File(blob, 'r')
architectSAP = tf.keras.models.load_model(f)
blob.close()
# obtain your results
prediction = architectSAP.predict([[json.loads(user_data)['half_marathon_minutes']]])
success = True

As well as pass the successful response to the user:
# apply carried out successfully, send a response to the user
# msg.body = json.dumps({'Results': 'Model applied to input data successfully.'})
msg.body = json.dumps({'marathon_minutes_prediction': str(prediction[0])})

I then deploy my Python Consumer after adding a Group for my Docker image again and passing in my model:



Once my Deployment is successful:


I retrieve my prediction e.g. with Postman:


I tried to keep this blog as plain vanilla as possible, to help you understand the underlying basic concepts. However, there is of course nothing wrong with a bit more sophistication like e.g. building a custom TensorFlow Operator to make your Graphs more efficient and easier to read:

4 Comments