This blog post introduces key model lifecycle management features in the SAP HANA Predictive Analysis Library (PAL) through the Python Machine Learning Client for SAP HANA (hana_ml). PAL provides rich in-database machine learning capabilities, while hana_ml exposes these capabilities in Python so that users can build, execute, and operate model workflows directly on SAP HANA data with minimal data movement.
Model lifecycle development usually starts from a fit task, followed by some scoring tasks to evaluate the performance of a derived model. After repeated cycles of fitting and scoring tasks to get an acceptable model, the model can be delivered for continual prediction tasks and optional drift detection tasks. Drift detection is used to answer the question of whether the currently deployed model is still performing acceptably, since patterns in prediction data may change from time to time. Once drift is confirmed, a fit task on the latest data is required, and therefore a new cycle of model development begins.
Experiment tracking is a feature that automatically traces and persists some information for the input and output of PAL procedure execution. Each PAL procedure call is identified by a tracking ID. The tracking ID consists of two parts: an experiment ID and a run ID, representing one experiment and one run respectively. Logically, the relationship between experiment and run is one-to-many. One experiment usually targets a single task such as fitting, predicting, or scoring, with a selected algorithm. Users may have multiple runs for the same experiment while exploring different hyperparameters and/or datasets.
The persisted information for experiment tracking consists of two parts: tracking metadata and tracking logs. Different APIs are available to retrieve them. Tracking metadata records information about the invoked PAL procedure and its current running status. Tracking logs are persisted in chronological order, each of them has a log type, log value and log timestamp.
The available log types include:
Reproducing the exact same result with a given dataset and configuration is a fundamental requirement in model development. It can be used for validating findings, debugging models, and ensuring consistent behavior over time. The logs of input parameters, dataset metadata, and tracking metadata are essential information for achieving this purpose.
The model signature log can guide users in executing prediction and scoring tasks with a derived model. Metric and figure logs can be used for analyzing model performance.
Model Storage is another important feature for model development. Users can identify a model by model name and model version. Model storage serves as the standard source for machine learning models when committing prediction, scoring and evaluation tasks. Besides, newly generated models can be pushed into model storage with a selected model name and model version.
Drift detection is a turning point in model development and runtime operations. It alerts users that model degradation may happen or has happened. This can be achieved with the features of drift detection combined with metric and/or figure track entities described above. Currently hana_ml provides visual charts to compare metrics between different experiment runs to verify whether such a drift has happened.
Finally, automation drives the workflow across model development activities. With the Scheduled Execution feature, users can not only automate single tasks such as model fit, model prediction, and model scoring, but also orchestrate related tasks in a defined order or in parallel. All automated tasks can be scheduled as background jobs either immediately or by time-frequency settings.
With all these features in hana_ml, users can construct flexible model development workflows, including model fitting, model testing, model storage, model serving, and drift detection.
This use case shows a compact end-to-end model lifecycle workflow based on the public Pima Indians Diabetes dataset. This example is intended for demonstration and teaching purposes only. It uses SAP HANA Predictive Analysis Library (PAL) together with the Python Machine Learning Client for SAP HANA (hana_ml) to illustrate how a model is trained, tracked, stored, operationalized, and monitored over time. The notebook example for this use case can be downloaded from the ml-lifecycle-examples folder.
Lifecycle phases covered:
Dataset note: this example uses the Pima Indians Diabetes dataset, a binary classification task based on clinical measurements such as glucose concentration, BMI, and age.
We start by importing the core libraries and creating a connection to SAP HANA. This connection context is the entry point for loading data, running PAL algorithms, and managing lifecycle artifacts in the database. It is the required first step because all later actions reuse it.
from hana_ml import dataframe
from hana_ml.algorithms.pal.utility import DataSets
from hana_ml.algorithms.pal.unified_classification import UnifiedClassification
from hana_ml.artifacts.tracking.tracking import MLExperiments, delete_experiment_log, get_tracking_log
from hana_ml.visualizers.tracking import ExperimentMonitor, ScheduledTaskMonitor
from hana_ml.model_storage import ModelStorage
# Establish a connection to SAP HANA
conn = dataframe.ConnectionContext(url='<host>', port=<port>, user='<user>', password='<pwd>')Replace ‘host’, ‘port’, ‘user’, and ‘pwd’ with your SAP HANA instance details.
Next, use train_test_val_split to create the main working subsets: df_train for baseline fitting, df_score for holdout evaluation, and df_inference for scheduled batch prediction. This keeps training, evaluation, and operational inference clearly separated.
from hana_ml.algorithms.pal.partition import train_test_val_split
df_full, _, _, _ = DataSets.load_diabetes_data(conn)
df_train, df_score, _ = train_test_val_split(data=df_full, partition_method='random', random_seed=23,
training_percentage=0.8, testing_percentage=0.2, validation_percentage=0.0, id_column="ID")
# Save tables in HANA
df_train.save("PIMA_INDIANS_DIABETES_TRAIN_TBL")
df_score.save("PIMA_INDIANS_DIABETES_SCORE_TBL")
print("Train sample shape : ", df_train.shape)
print("The first 3 rows: ")
print(df_train.head(3).collect())
print("The first 3 rows of inference sample (no label column):")
df_inference = df_score.deselect("CLASS")
print(df_inference.head(3).collect())Fig.1 Data samples
To mimic a simple monitoring scenario, we then construct weekly batches with different label distributions. These batches keep the label column so later scoring runs can be compared for drift.
# Simulated weekly batches for drift observation (different label mix).
week20_hdf = df_score.filter('ID < 100')
week21_hdf = df_score.filter('CLASS = 1')
week22_hdf = df_score.filter('CLASS = 0')
week20_hdf.save('DIABETES_WEEK20')
week21_hdf.save('DIABETES_WEEK21')
week22_hdf.save('DIABETES_WEEK22')
print(week20_hdf.shape)
print(week21_hdf.shape)
print(week22_hdf.shape)Fig 2. Shape of simulated data
This step establishes the initial reference model. We train a Hybrid Gradient Boosting Tree classifier, use MLExperiments to log parameters and metrics automatically, and keep the run history auditable. Training and tracking are created together.
We begin by creating a dedicated tracking session identified by a unique EXPERIMENT_ID. Within that experiment, training and scoring are recorded as separate runs.
# Constants for the workflow
EXPERIMENT_ID = "BLOG_DIABETES_TRACKING"
MODEL_NAME = "BLOG_DIAB_HGBT"
TASK_ID = "DIABETES_WEEKLY_PREDICT"
# Optional, clear previous tracking logs so each run starts from a clean state.
delete_experiment_log(conn, EXPERIMENT_ID)
# Initialize the experiment tracker
tracker = MLExperiments(
connection_context=conn,
experiment_id=EXPERIMENT_ID,
experiment_description="diabetes experiment")
# Define the hyperparameter grid for model search.
param_values = {
"learning_rate": [0.1, 0.4],
"n_estimators": [5, 10],
"split_threshold": [0.1, 0.3]}We configure the model with grid search and cross-validation, then enable autologging. This captures the hyperparameters, source dataset, and generated artifacts under the run name “Diagnosis_classifier-fit”. The key value is reproducibility: the tracked baseline can later be compared with monitoring runs.
uhgbt = UnifiedClassification(
func="HybridGradientBoostingTree",
param_search_strategy="grid",
resampling_method="cv",
evaluation_metric="error_rate",
ref_metric=["auc"],
fold_num=5,
random_state=123,
param_values=param_values)
# Enable automatic tracking of parameters, metrics, and artifacts.
tracker.autologging(
model=uhgbt,
run_name="Diagnosis_classifier-fit",
dataset_name="diabetes",
dataset_source="PIMA_INDIANS_DIABETES_TRAIN_TBL")
# Train the model using stratified partitioning
uhgbt.fit(data=df_train, key="ID", label="CLASS", partition_method="stratified",
partition_random_state=5, stratified_column="CLASS")
# Log a separate scoring run
tracker.autologging(
model=uhgbt,
run_name="Diagnosis_classifier-score",
dataset_name="diabetes",
dataset_source="PIMA_INDIANS_DIABETES_SCORE_TBL")
score_pred, score_stats, score_cm, score_metrics = uhgbt.score(
data=df_score, key="ID", label="CLASS")After execution, we can retrieve and inspect the tracking artifacts for the current run:
tracking_id = tracker.get_current_tracking_id()
print(f"tracking id: {tracking_id}")
print(tracker.get_tracking_metadata_for_current_run().collect())
print(get_tracking_log(connection_context=conn, tracking_id).head(5).collect())Fig. 3 Tracking metadata and log
We also provide a dashboard view of tracked runs, metrics, and artifacts.
experiment_monitor = ExperimentMonitor(connection_context=conn, experiment_ids=[EXPERIMENT_ID])
experiment_monitor.start()Fig. 4 Experiement dashboard
In the Experiment Monitor, the experiment named “BLOG_DIABETES_TRACKING” contains two runs. Opening a run shows its tracked details, including parameters, metrics, and visual artifacts. For example, the run labeled “Diagnosis_classifier-score” shows charts such as ROC and cumulative gains plots.
Fig.5 Continuous figure
In this step, we persist the selected model in SAP HANA. This creates a versioned model artifact that can be loaded for later use. The ModelStorage class handles save, list, load, and delete operations. In the notebook, this step shows how the tracked baseline is promoted into a reusable deployment artifact.
# Example decision: choose uhgbt as the baseline operational model.
candidate_model = uhgbt
# Assign model identity fields before persisting.
candidate_model.name = MODEL_NAME
candidate_model.version = 1
model_storage = ModelStorage(connection_context=conn)
# if_exists='replace' overwrites an existing model with the same name/version.
model_storage.save_model(model=candidate_model, if_exists="replace")
# List the models
model_storage.list_models(name=MODEL_NAME)
deployed_model = model_storage.load_model(name=MODEL_NAME)
print(f"Deployed model from storage: {MODEL_NAME}")Fig. 6 Model list
The following command is optional cleanup for demo artifacts.
model_storage.delete_models(name=MODEL_NAME)This step moves the stored model into an operational workflow by creating scheduled inference. The scheduler runs predictions on a defined cadence, such as weekly, without manual execution. In other words, deployment is not only about storing a model, but also about defining how it will run repeatedly.
# Import and initialize the scheduler
from hana_ml.algorithms.pal.scheduler import ScheduledExecution
sexec = ScheduledExecution(conn)
# Define a prediction task using the deployed model
sexec.create_predict_task(
obj=deployed_model,
predict_params={"data": df_inference, "key": "ID"},
task_id=TASK_ID,
force=True)
# Schedule the task to run automatically
weekly_cron = "* * * mon 8 0 0"
schedule_info = sexec.create_task_schedule(
task_id=TASK_ID,
cron=weekly_cron,
force=True)
schedule_info.collect()Fig.7 Schedule information
# Launch the scheduler monitoring dashboard
scheduled_task_monitor = ScheduledTaskMonitor(connection_context=conn, task_ids=[TASK_ID])
scheduled_task_monitor.start()Fig.8 Scheduled Task Monitor
The following commands are included for reference only. In the notebook, they are optional inspection and validation utilities rather than required steps.
This final step closes the model lifecycle loop by simulating production monitoring. We reuse the weekly batches prepared earlier, score them with the deployed model, and compare whether the tracked metrics remain stable or start to drift. The main idea is not the exact metric values in this toy example, but the monitoring pattern: repeated scoring of later batches against the same deployed model.
# Optional
WEEKLY_EXPERIMENT_ID = "WEEKLY_HGBT_TRACK"
delete_experiment_log(conn, WEEKLY_EXPERIMENT_ID)
# Create a dedicated experiment for production-like weekly monitoring.
MLModel_weekly_tracking = MLExperiments(
connection_context=conn,
experiment_id=WEEKLY_EXPERIMENT_ID,
experiment_description="Monitor of HGBT model weekly"
)
# Reuse the weekly slices prepared earlier and log one score run per week.
weekly_batches = [
("week20-score", week20_hdf, "diabetes_week20", "DIABETES_WEEK20"),
("week21-score", week21_hdf, "diabetes_week21", "DIABETES_WEEK21"),
("week22-score", week22_hdf, "diabetes_week22", "DIABETES_WEEK22"),
]
for run_name, weekly_batch, dataset_name, dataset_source in weekly_batches:
MLModel_weekly_tracking.autologging(
model=deployed_model,
run_name=run_name,
dataset_name=dataset_name,
dataset_source=dataset_source
)
score_pred, score_stats, score_cm, score_metrics = deployed_model.score(
weekly_batch,
key="ID",
label="CLASS"
)You can visualize these trends, such as accuracy or AUC, in the Experiment Monitor dashboard. For example, in the simulated scenarios for Weeks 20, 21, and 22, you can select the accuracy metric, choose the three corresponding weekly runs, and click Compare to open a detailed comparison view.
Fig. 9 Weekly monitor
In the figure below, you can observe fluctuations in accuracy across the weeks, for example from 0.78 to 0.61 and then to 0.81. A sharp shift such as the drop in Week 21 would be a reasonable drift signal. In a real workflow, that signal would trigger a review and could lead to actions such as retraining the model.
Fig. 10 Model Drift
This blog post shows the key features and an end-to-end model lifecycle workflow use case with SAP HANA PAL and hana_ml. Conceptually, model lifecycle management in SAP HANA PAL with hana_ml is a closed-loop process that connects model development, model registration, model operations, and post-deployment monitoring in one traceable workflow. In this article, we use a compact scenario to explain that progression from baseline training to operational monitoring.
Notebook download folder: ml-lifecycle-examples
Product Documentation:
Most Relevant Blog Posts:
Other Related Blog Posts:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 48 | |
| 47 | |
| 37 | |
| 32 | |
| 29 | |
| 23 | |
| 22 | |
| 22 | |
| 22 | |
| 22 |