
With the SAP HANA Cloud 2025 Q1 release, several new embedded Machine Learning / AI functions have been released with the SAP HANA Cloud Predictive Analysis Library (PAL). Key new capabilities to be highlighted are introduction of text analysis, text vectorization, vector data as input to many more Machine Learning functions like classification and regression in the SAP HANA Cloud database. An enhancement summary is available in the What’s new document for SAP HANA Cloud database 2025.02 (QRC 1/2025).
Text extraction from binary files
A new Document Filter Library (DFL) has been released, which provides functions to extract plain text from binary files like PDFs stored as BLOBs in SAP HANA Cloud tables. By use of the DFL_EXTRACT_TEXT_FROM_DOCUMENTS procedure, application developers and data scientist are enabled to unlock the valuable insights from text hidden in those files.
Extracted text is made available for text preprocessing tasks like chunking, NLP (e.g. entity extraction, sentiment detection), text vectorization using SAP HANA Cloud embedding models and further downstream analysis tasks based on text vectors such as similarity search, text mining, clustering and classification.
This enhancement not only widens the scope of your data analysis but also makes it easier to leverage text-based insights in your applications.
The Document Filter Library (DFL) is provided as a SAP HANA Cloud plugin. The setup instructions are documented in the SAP HANA Cloud administration guide and include the following steps
Further note, the document filter library in SAP HANA Cloud provides comparable text from binary files extraction-capabilities like the text analysis and full text indexing from binary files features in SAP HANA Platform, for respective reference see the text analysis SQL API for extraction of text from binary documents in the Text Analysis Developer Guide or the SAP HANA search developer guide for document mime-type reference.
Keyword-based text search enhancements
Information retrieval from text typically starts with text search techniques. The newly released keyword-based search function BM25 search (SEARCH_DOCS_BY_KEYWORDS) has been further enhanced to support complete natural sentences as input to the BM25 search:
Hybrid Gradient Boosting Trees (HGBT) enhancements
HGBT now supports multi-target regression and multi-label classification models, alike MT_MLP modelling predictions for multiple columns using a single model.
As the prediction output incl. values for multiple targets or classes, a new function PAL_HGBT_MULTI_TASK_PREDICT is introduced generating the extended output
HGBT regression models with trend extrapolation for Time Series
Tree-based models like Hybrid Gradient Boosting Trees are great at capturing patterns from the training data. However, they falter when it comes to projecting these patterns into the future, hence especially when models like HGBT regressors are applied in time series forecasting. A regular such model has no means to infer trends or patterns beyond the bounds of the training data, making true extrapolation impossible. For a detailed discussion see for example this blog post: overcoming the limitations of tree-based models in time series forecasting.
HGBTs, now introduces a linear component to the tree-building process, allowing the model to capture linear patterns and thus trend extrapolation capabilities
With this key enhancement PAL Hybrid Gradient Boosting Tree (HGBT) now allows for building superior regression models, with improved trend extrapolation of value outside of the training data scope.
Unified classification / regression enhancements
The modernized and much enhanced Multi-task MLP Neural Network modeling function, can now be used with the Unified Classification/Regression function
AutoML for time series optimization enhancements
AutoML introduces an improved hyperparameter optimization for time series models
The full list of new methods and enhancements with hana_ml 2.24 is summarized in the changelog for hana-ml 2.24 as part of the documentation. The key enhancements in this release include
Dataframe enhancements
AutoML and pipeline modeling improvements
Text processing enhancements
Further misc. enhancements
You can find an examples notebook illustrating the highlighted feature enhancements here 25QRC01_2.24.ipynb.
Introduction
The new generative AI-toolkit for SAP HANA Cloud (hana-ai) is an extension of the existing Python ML client for SAP HANA (hana-ml), mainly focusing on generative AI-assisted machine learning scenario development using hana-ml, thus streamlining embedding of ML capabilities with SAP BTP Cloud Application Programming (CAP) apps.
It builds upon many leading-edge generative AI related open source Python libraries (e.g. langchain) and provides seamless integration with SAP HANA Cloud, HANA vector engine, and others Python libraries like SAP GenAI Hub SDK.
Key capabilities for AI-assisted HANA ML development and code generation
Highlighted features
Using sample code knowledge stores
Using new set of agent-tools targeted for simplified use of HANA ML
Using new set of agent-tools targeted for simplified use of HANA ML
Using a conversational agent for HANA ML
The new generative AI-toolkit unlocks and simplifies embedded AI SAP BTP application development with natural language assistance providing a faster getting-started experience, automated code generation based on template code samples, hana-ml tools for an conversational agent.
For further details see the Introduction at github.com/SAP/generative-ai-toolkit-for-sap-hana-cloud and the documentation at sap.github.io/generative-ai-toolkit-for-sap-hana-cloud.
A sample python notebook using the conversational agent can be found at github.com/SAP-samples/hana-ml-samples/.../Generative-AI-toolkit-SAPHANACloud-Demo-SalesRefunds-Fore...
Setup instruction for generative AI-toolkit (hana-ai)
Overall, again a great set of Machine Learning and AI capabilities enhancements with this SAP HANA Cloud release!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
21 | |
19 | |
17 | |
10 | |
8 | |
7 | |
7 | |
6 | |
6 | |
6 |