
Solution Diagram
Train and deploy the model using the FedML Databricks library:
Pre-requisites:
1. Install the FedML Databricks library.
%pip install fedml-databricks --no-cache-dir --upgrade --force-reinstall
Import the necessary libraries:
from fedml_databricks import DbConnection,predict
It may also be useful to import the following libraries if you are using them in your notebook
import numpy as np
import pandas as pd
import json
2. Create a secure connection to SAP Datasphere and retrieve the data.
Create a Databricks secret scope by referring to the article Create a Databricks-backed secret scope on Databricks website. Then, create the Databricks secret containing SAP Datasphere connection details in the form of json, as described in the article. The SAP Datasphere json connection credentials can be obtained using the method described in this Github documentation - DbConnection class.
config_str=dbutils.secrets.get('<secret-scope>','<secret-key>')
config=json.loads(config_str)
Now, create a DbConnection instance to connect to SAP Datasphere:
dsp = DbConnection(dict_obj=config)
We can now retrieve the data. There are multiple ways of retrieving the data from SAP Datasphere. The following code gets the data from SAP Datasphere in the form of a Pandas DataFrame. The appropriate schema and view name must be entered below:
df=dsp.execute_query('SELECT * FROM \"<schema>\".\"<view>\"')
df
3. Train the ML model using MLflow.
Import the MLflow library
import mlflow
Here is a sample linear regression model being trained using MLflow:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
def train_model(x_train,x_test, y_train, y_test,experiment_name,model_name):
mlflow.set_experiment(experiment_name)
with mlflow.start_run() as run:
model = LinearRegression().fit(x_train, y_train)
score = model.score(x_test, y_test)
mlflow.log_param("score",score)
mlflow.sklearn.log_model(model,model_name,
registered_model_name = model_name)
run_id = run.info.run_id
return run_id
x_train, x_test, y_train, y_test = train_test_split(dataframe , y, test_size=0.3)
experiment_name,model_name='/Users/<user>/<experiment-name>','<model_name>'
run_id=train_model(x_train,x_test, y_train, y_test,experiment_name,model_name)
model_uri=f"runs:/{run_id}/{model_name}"
4. Deploy the ML model as a webservice endpoint and inference the deployed model.
Option 1: Deploy the trained MLflow model to Databricks:
Executing the notebook inside Databricks workspace will register the model in the managed MLflow, if you trained the model outside of Databricks you can register the model in the MLflow model registry:
import time
model_version = mlflow.register_model(model_uri=model_uri,name=model_name)
# Registering the model takes a few seconds, so add a small delay
time.sleep(15)
Transition the model to Production:
from mlflow.tracking import MlflowClient
client = MlflowClient()
client.transition_model_version_stage(
name=model_name,
version=model_version.version,
stage="Production",
)
You can use MLflow to deploy models for batch or streaming inference or to set up a REST endpoint to serve the model. Batch inference the MLflow model deployed in Databricks:
model = mlflow.pyfunc.load_model(f"models:/{model_name}/production")
infererence_result=model.predict(<test_data>)
Option 2: Deploy the MLflow model to SAP BTP Kyma Kubernetes:
print("The DATABRICKS_URL is 'https://{}'".format(spark.conf.get("spark.databricks.workspaceUrl")))
print("The MODEL_URI is '{}'".format(model_uri))
For ease of use, you can perform steps 4.2.3 & 4.2.4 in the hyperscaler jupyter notebook (AzureML notebook or Sagemaker notebook):
4.2.4. Deploy the Databricks MLflow model to SAP BTP, Kubernetes environment using the below method. The ‘databricks_config_path’ refers to the path of the configuration file created in the previous step:
from fedml_databricks import deploy_to_kyma
endpoint_url=deploy_to_kyma(databricks_config_path='<databricks-config-json-file-path>')
print("The kyma endpoint url is '{}'".format(endpoint_url))
Take note of the SAP BTP, Kubernetes environment endpoint.
Inference the MLflow model deployed in SAP BTP, Kubernetes environment within the Databricks notebook as follows:
inference_dataframe=predict(endpoint_url=<kyma-endpoint>,content_type=<content-type>,data=<test-data>)
5. FedML Databricks library allows for bi-directional data access. You can store the inference result in SAP Datasphere for further use and analysis.
5.1 Create a table in SAP Datasphere:
dsp.create_table("CREATE TABLE <table_name> (ID INTEGER PRIMARY KEY, <column_name> <data_type>,..)")
5.2 You can now restructure the data to write back to SAP Datasphere in your desired format and insert the data in the table:
dsp.insert_into_table('<table_name>',<pandas_dataframe_containing_datasphere_data>)
Now, that the data is inserted into the local table in SAP Datasphere, you can create a view and deploy it in SAP Datasphere. You can then use the view to perform further analysis using SAP Analytics Cloud.
More information on the use of the library and end-to-end sample notebooks can be found in our Github repo here.
In summary, the FedML Databricks library provides an effective and convenient way to federate the data from multiple SAP and non-SAP source systems, without the overhead of any data migration or replication. It enables the Data scientists to effectively model SAP and non-SAP data in real-time, for use in ML experimentation. It also provides the capabilities to deploy models to SAP BTP, Kyma runtime, perform inferencing on the deployed webservice and store the inference data back to SAP Datasphere for further use and analysis.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
7 | |
7 | |
7 | |
7 | |
6 | |
5 | |
5 | |
5 | |
5 | |
5 |