With the SAP HANA Cloud 2025 Q2 release, several new embedded Machine Learning / AI functions have been released with the SAP HANA Cloud Predictive Analysis Library (PAL) and the Automated Predictive Library (APL).
Key new capabilities to be highlighted include
An enhancement summary is available in the What’s new document for SAP HANA Cloud database 2025.14 (QRC 2/2025).
Text tokenization
A new text tokenization function has been released, allowing to split text into tokens, a fundamental preparation step in natural language processing (NLP) like text analysis, and many other downstream text processing tasks like text embedding or vector similarity search. The new function support key tokenization capabilities like
Text embedding
A new text embedding model version (SAP_GXY.20250407) is made available, which has been fine-tuned based on a roberta-base-encoder model and more training data for improved retrieval accuracy, short text scenarios and extend multi-lingual & cross-lingual retrieval scenarios. New additional languages supported include Chinese (CH), Japanese (JP) and Italian (IT). The default token length for embedding functions has been increase to 512.
Extended embedding vector processing by Machine Learning functions
Unlocking the semantic understanding of your text data stored in SAP HANA Cloud, for use cases like similarity search, however moreover for machine learning scenarios like document/text- classification and –clustering, and more has now been extended to even more PAL Machine Learning functions
Outlier detection enhancements using Isolation Forests
The Isolation Forest function for outlier detection has been enhanced with
Isolation Forest is a strong and trending function for outlier detection, which can be applied on any data for outlier analysis inside the database, hence especially suitable also for use cases where data shall not leave the system or is too big to be copied out for analysis like use cases for detecting outliers on your financial accounting, like the universal journal data (ACDOCA).
Constraint Clustering
A new constraint clustering function is introduced, an advanced form of clustering that incorporates domain-specific constraints or prior information to guide the clustering process, ensuring more accurate and meaningful results tailored to specific analytical needs. Traditional clustering methods are mostly limited and cannot include prior knowledge about the data, often leading to challenges in achieving meaningful or contextually relevant groupings.
Prior contexts can be included in the clustering process as
A detailed introduction to the new function is given in the following blog post https://community.sap.com/t5/technology-blog-posts-by-sap/clustering-text-documents-using-constraine...
Further enhanced machine learning algorithms for Tabular AI scenarios
The recently implemented Multi-task MLP (multi-layer perceptron) neural network function, unlocking predictions for multiple targets / labels using a single model, has now been enhanced with an improved built-in model evaluation and parameter search interface, providing faster approach and productivity to achieve even better prediction outcomes.
The new optimization can be leveraged by calling the function directly or via its use within the Unified Classification/Regression functions and supports to search and select the following optimal neural network model parameter values
In the domain of time series analysis and forecasting, the ARIMA forecasting function now supports to keep context of the time horizon index and interval
Experiment tracking and monitoring for PAL ML models
Machine learning experimentation requires robust tracking capabilities to ensure reproducibility, comparison, and auditability of models. SAP HANA Cloud's new ML tracking feature provides seamless integration with Predictive Analysis Library (PAL) procedures, enabling automatic logging of critical experiment artifacts. This end-to-end tracking solution captures parameters, datasets, models, metrics, and visualizations in a structured way, transforming how data scientists manage ML workflows.
The new execution tracking of PAL procedure supports
A more detailed introduction is provided in the following blog post https://community.sap.com/t5/technology-blog-posts-by-sap/comprehensive-guide-to-mltrack-in-sap-hana...
Task scheduling for PAL procedures
A new PAL task scheduling allows you to run SQLScript procedures (calling PAL procedures) by the SAP HANA Cloud schedular asynchronously (cron-based). The targeted SQLScript (PAL) procedure calls get mapped to a define task with task ID, task descriptions, owner, etc. A task can be scheduled to executed, a job is the instance of scheduled task.
The python ML client adds additional interfaces and methods to leverage the new capabilities easily by experts developing HANA ML scenarios.
The new data drift detector in the APL helps you spot changes or deviations between a given dataset and a reference. Reference data could be a version in the past, or a particular segment of customers or employees, or an expected distribution (e.g. Benford). Use cases of data comparison are:
This feature from APL (Automated Predictive Library) is available for both Python and SQL. For more details see blog post on the HANA ML Data Drift Detector.
The full list of new methods and enhancements with hana_ml 2.25 is summarized in the changelog for hana-ml 2.25 as part of the documentation. The key enhancements in this release include
Text and vector processing enhancements
Classification / regression function enhancements
AutoML and pipeline modeling improvements
You can find an examples notebook illustrating the highlighted feature enhancements here 25QRC02_2.25.ipynb.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
8 | |
8 | |
7 | |
6 | |
6 | |
6 | |
5 | |
5 | |
5 | |
5 |