Technology Blog Posts by SAP
cancel
Showing results for 
Search instead for 
Did you mean: 
ChristophMorgen
Product and Topic Expert
Product and Topic Expert
1,699

With the SAP HANA 2.0 SPS 08 release, several new features have been released in the SAP HANA Cloud Predictive Analysis Library (PAL), an enhancement summary is available in the What’s new document for Predictive Analysis Library (PAL) enhancements overview with SAP HANA Platform 2.0 SPS 08.

The feature highlights for the current release are described in more detail below

Classification and regression enhancements

A new multi-task multilayer perceptron (MLP) function is introduced, enabling multi-label classification or multi-target regression scenarios. Using a multi-task learning neural network, a single ML model allows to predict multiple, related target columns at once, as the model captures both common features across tasks as well as task-specific information with the same prediction model.

  • It leverages the commonalities between related tasks to improve the performance, generalization, and training efficiency, and moreover enables efficient use of data, better feature extraction, knowledge transfer, regularization, and end-to-end learning.
  • Furthermore, the function supports early stopping using validation data to avoid overfitting.
  • Users of the new multi-task MLP also benefit from shorter MLP training times and models with improved accuracy.

     The new function provides unique prediction model capabilities for example scenarios like

  • automated multi-field value proposals or pre-filling of forms (e.g. Sales Order Automation)
  • or predicting multiple price-/sales-targets (average, minimum, etc.) in a single model.

ChristophMorgen_0-1732292810867.png

ChristophMorgen_1-1732292810871.png

 

Unified regression and -classification now support permutation feature importance,

  • a new and trending method in global explainability to evaluate the contribution of individual features to the overall predictive power of a model.
  • This is achieved by measuring the decrease of a model’s performance when a feature‘s values are being shuffled around. A detailed explanation and examples are also given in this blog Global Explanation Capabilities in SAP HANA Machine Learning.

ChristophMorgen_0-1732293526521.png                  Classic feature importance vs permutation feature importance reports 

 

The unified regression function is enhanced with prediction- and confidence interval

  • Interval estimates and the point estimate of response values in the unified prediction or scoring regression procedure. The prediction interval predicts distribution of individual future observation of the target, while the confidence interval predicts the mean statistic of the future target population.
  • Now supported with Generalised Linear Models (GLM), Multiple Linear Regression (MLR), Random Decision Trees (RDT), and Hybrid Gradient Boosting Tree (HGBT).
  • Complementing this, a new interval quality function is added, which aids to evaluate the quality of an interval

 

A new outlier detection for regression function is added,

  • providing the ability to detect point outliers in data used when training linear or tree-based regression models using MLR and HGBT regressors.
  • Outliers in training data get identified based on residual analysis and outlier-score evaluation, then can be excluded for model training and allowing to build improved regression model predictions.

ChristophMorgen_4-1732292810875.png

ChristophMorgen_5-1732292810878.png

 

A new FairML classification and regression function is added

  • New function aiding to mitigate “bias” and potential “unfairness” in ML model predictions related to affected groups of individuals
  • Binary classification or regression models based on Hybrid Gradient Boosting Tree (HGBT) are supported
  • Fairness evaluation is aided with metrics like demographic parity, equalized odds, and more
  • For a detailed use case and function example see the following blog https://blogs.sap.com/2023/12/08/fairness-in-machine-learning-a-new-feature-in-sap-hana-pal/

 

The Hybrid Gradient Boosting Tree (HGBT) function has been improved

  • New objective function Huber Loss and reweighted square for more robust regression models
  • Improved classification models by support of F1, recall, precision as cross validation metrics and weight scaling of target values to address imbalanced classes or weight scale values related to different costs
  • For improved early stopping during model optimization, the validation metric for early stopping can now be explicitly set.

Further regression and classification enhancements included

  • Improved KNN search, by matrix-enabled search method
  • Option to limit results to top N classes in multi-class predictions with unified classification
  • Enhanced one-hot encoding of categorial features, supports aggregating infrequent values, into a single output for each feature (SVM, MLP, MLinR, MCLogR)
  • Data-parallel / massive execution support with Multiple Linear-, Exponential-, Bi-Variate Natural Logarithmic-, Bi-Variate Geometric-, Polynomial-Regression

 

Time series analysis and forecasting enhancements

New outlier detection function in time series analysis

  • Automatic or manual time series residual analysis, incl. intermittent time series detection, and missing value handling, auto smoothing of time series
    ChristophMorgen_6-1732292881846.png
  • Automatic selection of the outlier detection algorithm from the residual series incl. Z1, Z2, Inter-Quartile-Range, Mean absolute deviation, Isolation Forest and DBSCAN
  • Support of a voting ensemble across multiple outlier scoring metrics

 

Improved seasonality- and trend decomposition in time series analysis

  • Seasonal and trend decomposition using the STL-method (season, trend, low-pass window), based on LOESS (locally estimated scatterplot smoothing)
    and a super smoother function
  • Allowing for seasonality tests with more robust decomposition and less sensitive to outliers

Time series analysis permutation-based external feature importance evaluation

  • a new and trending method in global explainability to evaluate the contribution of individual features to the overall forecast model accuracy.
  • This is achieved by measuring the decrease of a model’s performance when a feature‘s values are being shuffled around. A detailed explanation and examples are also given in this blog Global Explanation Capabilities in SAP HANA Machine Learning.
  • Supported with the following time series algorithms ArimaX, Bayesian Structural Time Series (BSTS), Long term time series (LTSF), Additive model time series analysis (AMTSA)

Forecast prediction interval evaluation

  • a new interval quality function is added, which aids to evaluate the quality the forecasted prediction interval

Long term time series (LTSF) enhancements

  • New Neural Network model types (NLinear, DLinear, XLinear) for improved speed and forecast accuracy
  • Rigorous decomposition support of neural network time series models into TREND, SEASONAL and EXOGENOUS_CONTRIBUTIONS parts

ChristophMorgen_7-1732292881851.png

 

New data-parallel, massive execution time series function support

  • Online BCPD, Forecast Accuracy Measure, Time series outlier detection, Single Exponential Smoothing, Double Exponential Smoothing, Triple Exponential Smoothing, Brown Exponential Smoothing, Croston's Method, Linear Regression with Damped Trend and Seasonal Adjust, White Noise Test, Seasonality Test, Trend Test, Change-Point Detection, Intermittent Time Series Forecast (ITSF)
  • Allowing segmented time series analysis and forecasting and fastest result when modelling thousands of time series in parallel, identified by a segmentation,  grouping column

ChristophMorgen_8-1732292881853.png

 

 

AutoML and ML pipelines enhancements

General AutoML configuration and optimization improvements

  • Optimized pipeline operator selection, considering operator sequence and importance using operator connection constraints
  • Finetuning phase of best pipeline after initial AutoML optimization, by further evaluation the remaining parameter combination options and without changing the selected pipeline operators itself
  • Support of random search optimization for small AutoML configurations and faster results, for example with simple time series scenarios
  • Faster AutoML pipeline predictions interpretability based on a more light-weight shapley explainer model (SHAPGlobal surrogate). More details can be found in the following blog post Demystifying Pipeline Explanation for Time Series Data.
  • AutoML scenario configuration verification option
  • Pipeline predictions supporting more algorithm specific parameters
  • Score function for hold-out sample based classification / regression model evaluation
  • Improved AutoML progress logging in Python ML client, with full log information

AutoML Time Series enhancements

  • Support for auto exponential smoothing, polynomial feature generation,
    HGBT-/ MLinR- regression operators for time series
  • SPEC as a new evaluation metric supported

New data-parallel AutoML and ML Pipeline modeling functions allow for developing and executing multiple models in parallel by

  • building AutoML classification, regression or time series models in parallel on data subsets, identified by a grouping column,
  • using the new massive AutoML and Pipeline SQL procedures

This allows for fastest parallel AutoML modeling, like e.g. segmented AutoML time series. At the same time SAP HANA Cloud’s database workload management is best applied to control and limit compute resource consumed.

ChristophMorgen_9-1732292881855.png

 

 

Further AI function enhancements

New Quantile Transform data preprocessing

  • The Quantile Transform automatically applies a non-linear transform, that maps numeric input columns of varying data distributions, into a uniform or normal target distribution making it more suitable as an input for many ML models.
  • Preserves the rank order of the data, and reduces the impact of outliers, makes data less sensitive to extreme values

New financial data analysis functions

  • Benford Analysis, trending algorithm used to detect anomalies in financial data
    • One of the (not so) well-known statistical observations is the fact that in many datasets the leading significant digits are not equally distributed. If all digits were represented equally, then they would appear 11.1 percent (1/9TH) of the time. However, when analyzing real-world datasets, it is revealed that the distribution of the leading digits follows the Bedford’s law, also known as the first-digit law.
    • It is now very easy to validate if a dataset obeys Bedford’s law or not. A first step means, very commonly applied in financial applications is to detect unexpected value distributions and e.g. potential fraudulent transaction data.
  • Hull-White Model, trending used in modeling the time evolution of interest rates, rates, which are required for price estimation of financial instruments based on interest rate derivatives.
    • To apply the Hull-White model it first needs to be adopted to match existing market conditions (interest rates). This is achieved  by providing the values of the drift term of the Hull-White model as a time series as input table. The simulation will then provide the mean value for a given number of simulation paths (also specified as an input parameter), their variance, as well as the upper and lower bounds.

New MLP Recommender function

  • Powerful neural network function to predict binary targets such as click-through rates, a fundamental part in recommender systems.
  • The MLP recommender function does not require specific user and item details, and can include extra data features, unrelated features without any user connection.
  • The algorithm employs a dual-stream MLP framework, capable of handling larger-scale and “complicated” input data, designed to boost the recommendation performance.

Text processing enhancements

Improved Clustering

Data-parallel / massive execution support

  • Isolation Forest, Variance Test, Inter-Quartile Range, Fast Fourier Transform, K-Optimal Rule Discovery (KORD), K-Means, FP-Growth

 

Automated Predictive Library

Automated Predictive Library (APL) 2425 release enhancements

  • Reduced execution time around 15% for binary classification or regression models (gradient boosting), and around 35% for multi-class classification.
  • Better estimates and faster calculation of variable interactions
     See what's New 2311
  • Local explanations for time series See what's New 2403

 

Python machine learning client enhancements

The full list of new methods and enhancements within the hana_ml releases 2.17-2.22 are summarized in the changelog for hana-ml as part of the documentation.

Selected key enhancements include

AutoML and pipeline modeling improvements

  • Visual editor support for the AutoML scenario configuration

ChristophMorgen_10-1732292881860.png

 

  • Massive, data parallel AutoML support 

ChristophMorgen_11-1732292881870.png

  • PipelineProgressMonitor python runtime_platform specification for jupyter, vscode, bas, console
  • AutoML optimization enhancement with connection constraints and connection visualizations
    ChristophMorgen_12-1732292881875.png

     

More detailed references to the described enhancements can also be found in the New Machine Learning features in SAP HANA Cloud - SAP Community blog posts.

Furthermore you can find example notebooks illustrating the highlighted feature enhancements under hana-ml-samples/Python-API/pal/notebooks at main · SAP-samples/hana-ml-samples · GitHub, like for example H2SPS08_AutoML.ipynbH2SPS08_RegressionClassification.ipynbH2SPS08_TimeSeries.ipynbH2SPS08_MiscEnhancements.ipynb and H2SPS08_Design-time code generation.ipynb.

Explore and enjoy all the new capabilities!