Technology Blog Posts by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
ChristophMorgen
Product and Topic Expert
Product and Topic Expert
828

With the SAP HANA Cloud 2025 Q1 release, several new embedded Machine Learning / AI functions have been released with the SAP HANA Cloud Predictive Analysis Library (PAL). Key new capabilities to be highlighted are introduction of text analysis, text vectorization, vector data as input to many more Machine Learning functions like classification and regression in the SAP HANA Cloud database. An enhancement summary is available in the What’s new document for SAP HANA Cloud database 2025.02 (QRC 1/2025).

Text extraction and text search enhancements

Text extraction from binary files

A new Document Filter Library (DFL) has been released, which provides functions to extract plain text from binary files like PDFs stored as BLOBs in SAP HANA Cloud tables. By use of the DFL_EXTRACT_TEXT_FROM_DOCUMENTS procedure, application developers and data scientist are enabled to unlock the valuable insights from text hidden in those files.

ChristophMorgen_0-1744875447906.png

Extracted text is made available for text preprocessing tasks like chunking, NLP (e.g. entity extraction, sentiment detection), text vectorization using SAP HANA Cloud embedding models and further downstream analysis tasks based on text vectors such as similarity search, text mining, clustering and classification.

This enhancement not only widens the scope of your data analysis but also makes it easier to leverage text-based insights in your applications.

The Document Filter Library (DFL) is provided as a SAP HANA Cloud plugin. The setup instructions are documented in the SAP HANA Cloud administration guide and include the following steps

  • the plugin needs to be installed and Script Server activated
  • SQL user privileges need to be granted

ChristophMorgen_1-1744875447909.png

Further note, the document filter library in SAP HANA Cloud provides comparable text from binary files extraction-capabilities like the text analysis and full text indexing from binary files features in SAP HANA Platform, for respective reference see the text analysis SQL API for extraction of text from binary documents in the Text Analysis Developer Guide or the SAP HANA search developer guide for document mime-type reference.

 

Keyword-based text search enhancements

Information retrieval from text typically starts with text search techniques. The newly released keyword-based search function BM25 search (SEARCH_DOCS_BY_KEYWORDS) has been further enhanced to support complete natural sentences as input to the BM25 search:

ChristophMorgen_2-1744875940641.png

 

Classification, regression, time series and AutoML model enhancements

Hybrid Gradient Boosting Trees (HGBT) enhancements

HGBT now supports multi-target regression and multi-label classification models, alike MT_MLP modelling predictions for multiple columns using a single model.

  • Multiple DEPENDENT_VARIABLE values can be set as parameters to enable this
  • In addition, REAL_VECTOR columns are supported as target, and a multiple target regression model applied.
  • The objective functions supported for multi-target/label models
    • Regression OBJ_FUNC = 0 (Squared error)
    • Classification OBJ_FUNC = 7 (Softmax)
  • Using the new parameter USE_VEC_LEAF, allows to control for each tree leaf to have a vector-value as a predictor (default is no)

As the prediction output incl. values for multiple targets or classes, a new function  PAL_HGBT_MULTI_TASK_PREDICT is introduced generating the extended output

ChristophMorgen_3-1744875979136.png

 

HGBT regression models with trend extrapolation for Time Series

Tree-based models like Hybrid Gradient Boosting Trees are great at capturing patterns from the training data. However, they falter when it comes to projecting these patterns into the future, hence especially when models like HGBT regressors are applied in time series forecasting. A regular such model has no means to infer trends or patterns beyond the bounds of the training data, making true extrapolation impossible. For a detailed discussion see for example this blog post: overcoming the limitations of tree-based models in time series forecasting.

HGBTs, now introduces a linear component to the tree-building process, allowing the model to capture linear patterns and thus trend extrapolation capabilities

  • New parameter MODEL_TREE, when setting it to linear, the model will try to fit a linear model at each leaf node.

With this key enhancement PAL Hybrid Gradient Boosting Tree (HGBT) now allows for building superior regression models, with improved trend extrapolation of value outside of the training data scope. 

ChristophMorgen_4-1744875979141.png

 

Unified classification / regression enhancements

The modernized and much enhanced Multi-task MLP Neural Network modeling function, can now be used with the Unified Classification/Regression function

  • This unlocks local explainability when using Multi-task MLP models, the predictions providing reason codes with Shapley explanations (using parameters BACKGROUND_SIZE, BACKGROUND_SAMPLING_SEED)
  • However in the current update, the models are constrained a single target/label only when used via Unified Classification/Regression

 

AutoML for time series optimization enhancements

AutoML introduces an improved hyperparameter optimization for time series models

  • random-search optimization is now utilizing Hyperband
    • A new parameter WITH_HYPERBAND enables the enhancement in conjunction with SEARCH_METHOD set as "random"
    • The optimization now more efficiently allocates resources to different hyperparameter configurations, and significantly speeds up the optimization process

 

Python ML client (hana-ml) enhancements

The full list of new methods and enhancements with hana_ml 2.24  is summarized in the changelog for hana-ml 2.24 as part of the documentation. The key enhancements in this release include

Dataframe enhancements

  • New Vector- / vector-index management methods
  • New sort by vector-similarity

AutoML and pipeline modeling improvements

  • Data-parallel/massive AutoML
    progress monitor enhanced
  • New AutoML regression
    outlier detection

Text processing enhancements

  • New Text Analysis, POS, NER, Sentiment Analysis
  • Enhanced BM25 search by doc, sentence

Further misc. enhancements

  • New Tree Debriefing to text for decision tree
  • Enhanced CAP generation for AdditiveModelForecast, AutoML
    and Unified APIs wrt use of input/output signatures

ChristophMorgen_5-1744876082041.png

You can find an examples notebook illustrating the highlighted feature enhancements here 25QRC01_2.24.ipynb

 

Generative AI-toolkit for SAP HANA Cloud (hana-ai)

Introduction

The new generative AI-toolkit for SAP HANA Cloud (hana-ai) is an extension of the existing Python ML client for SAP HANA (hana-ml), mainly focusing on generative AI-assisted machine learning scenario development using hana-ml, thus streamlining embedding of ML capabilities with SAP BTP Cloud Application Programming (CAP) apps.

It builds upon many leading-edge generative AI related open source Python libraries (e.g. langchain) and provides seamless integration with SAP HANA Cloud, HANA vector engine, and others Python libraries like SAP GenAI Hub SDK

Key capabilities for AI-assisted HANA ML development and code generation

  • Use code-template knowledge stores and new hana-ml tools
    • Tools for code generation as well as executable tools for agentic-use of HANA ML for Forecasting
  • Conversational agent for HANA ML
    • Generative AI agent, utilizing executable tools targeted for HANA ML
    • Unlocking agentic, AI-assisted HANA ML development
  • Further conversational interfaces and agents
    • HANA dataframe agent for invoke code template-based tasks using hana-ml in python
    • SQL agent for invoking code template-based and langchain-db tool tasks via SQL generation and execution
    • Conversational SmartDataFrame interface, for applying standard dataframe methods as well as code template-based tasks to a dataframe

Highlighted features

Using sample code knowledge stores

  • Leverage hana-ml samples as knowledge store
    • Python | SQL default samples for PAL functions
    • Can be augmented with additional content, created with custom / best practice templates

ChristophMorgen_6-1744876123179.png

 

Using new set of agent-tools targeted for simplified use of HANA ML

  • Library of executable hana-ml tools
    • Tools execute independently and generate results, beyond code generation
    • Initial set of tools focused on Time Series Forecasting
    • Time series analysis tools applied to the time series data,
      determining guidance about which time series algorithms to apply
    • Can be augmented with additional, custom created tools
    • Utilized for automated and agentic-, generative AI-assisted
      HANA ML development or via direct tool-usage

ChristophMorgen_7-1744876123185.png

 

Using new set of agent-tools targeted for simplified use of HANA ML

  • Library of executable hana-ml tools
    • Tools execute independently and generate results, beyond code generation

Using a conversational agent for HANA ML

  • New Agent for HANA ML
    • Ready to use agent to build HANA Cloud PAL forecast models
    • Conversational agent session incl. chat history, based on langchain
    • Leverages executable tools specifically targeted for HANA ML
    • HANA ML-tools: initially focused on Time Series Forecasting like
      automatic time series data analysis, forecast algorithm proposal
      and fitting, or CAP project artifact generation

ChristophMorgen_8-1744876123201.png

 

The new generative AI-toolkit unlocks and simplifies embedded AI SAP BTP application development with natural language assistance providing a faster getting-started experience, automated code generation based on template code samples, hana-ml tools for an conversational agent.

For further details see the Introduction at github.com/SAP/generative-ai-toolkit-for-sap-hana-cloud and the documentation at sap.github.io/generative-ai-toolkit-for-sap-hana-cloud.

A sample python notebook using the conversational agent can be found at github.com/SAP-samples/hana-ml-samples/.../Generative-AI-toolkit-SAPHANACloud-Demo-SalesRefunds-Fore...

Setup instruction for generative AI-toolkit (hana-ai)

  • Download the hana-ai package from github releases and install it using _pip install generative-ai-toolkit-for-sap-hana-cloud-x.x.x.zip_, or use _pip install hana-ai_ for install from pypi.org/hana-ai.

  • Setup of the SAP generative AI hub SDK
    • Install the required Python packages pip install "generative-ai-hub-sdk[all]".
    • Create a deployment for a generative AI model in the SAP BTP, for instructions see the SAP AI core documentation section Create a Deployment for a Generative AI Model.
    • The generative AI hub SDK reuses configuration settings from the AI-core-SDK, these include client ID, client secret, authentication URL, base URL, and resource group. You can set these values as environment variables or via a config file. For detailed instructions see this SAP Learning unit on using Generative-AI-Hub-SDK.

 Overall, again a great set of Machine Learning and AI capabilities enhancements with this SAP HANA Cloud release!