
With the 2024 Q1 database release, several new features have been released the SAP HANA Cloud Predictive Analysis Library (PAL), an enhancement summary is available in the What’s new document for SAP HANA Cloud database 2024.02 (QRC 1/2024).
The feature highlights for the current release are described in more detail below
Unified Regression along with Unified Classification and Time Series now supports permutation feature importance, a new and trending method in global explain-ability to evaluate the contribution of individual features to the overall predictive power of a model. This is achieved by measuring the decrease of a model’s performance when a feature‘s values are being shuffled around. A detailed explanation and examples are also given in this blog Global Explanation Capabilities in SAP HANA Machine Learning.
Classic feature importance vs permutation feature importance reports (see blog for details)
The Hybrid Gradient Boosting Tree (HGBT) now supports F1-scores, recall and precision as cross validation metric for improved, more targeted classification models. Furthermore, weight scaling of target values in classification is now supported to address imbalanced classes or weight scale target values in relation for example to different costs associated to the different class values.
A new and trending regression model objective function “reweighted square” has been introduced, aiding to achieve more robust and regularized regression models.
For improved early stopping during model optimization, the validation metric for early stopping can now be explicitly set.
The recently introduced multi-layer perceptron MLP recommender function, now supports multiclass classification and regression recommender scenarios. This allows to reformulate the recommendation task as a classification or regression problem. The implementation employs a dual-stream framework where two sets of features representing for example user – and items features, respectively, are fed into a feature selection module. The outputs are streamed into MLP-neural networks and combined in a bilinear aggregation layer. This new and trending neural network framework can handle large-scale data volumes in recommendation scenarios very effectively.
The K-Nearest Neighbor (KNN) classification and regression functions has been enhanced with a new similarity search method, in addition to brute force and KD-tree searching a matrix enabled search-method has been introduced, allowing for much faster similarity search results especially with high-dimensional numeric feature data.
The Auto-ML functions for the Predictive Analysis Library (PAL) have been enhanced with
Multiple documents (here IDs 0 and 5) are searched in parallel for related documents
The newly implemented single-factor Hull-White procedure , can be used to model the time evolution of interest rates, which are required for price estimation of financial instruments based on interest rate derivatives.
To apply the Hull-White model it first needs to be adopted to match existing market conditions (interest rates). This is achieved by providing the values of the drift term of the Hull-White model as a time series as input table. The simulation will then provide the mean value for a given number of simulation paths (also specified as an input parameter), their variance, as well as the upper and lower bounds.
The chart above depicts the initial dataset used to calibrate the mode, mean and confidence interval of the Hull-White simulation.
New Benford’s Law function in PAL, a trending algorithm used to detect anomalies in numerical datasets like e.g. financial transactions.
One of the (not so) well-known statistical observations is the fact that in many datasets the leading significant digits are not equally distributed. If all digits were represented equally, then they would appear 11.1 percent (1/9TH) of the time. However, when analyzing real-world datasets, e.g. the population totals of the US census data, it is revealed that the distribution of the leading digits follows the Bedford’s law, also known as the first-digit law.
With the help of PAL’s new BENFORD analysis function it is now very easy to validate if a dataset obeys Bedford’s law or not. A first step means very commonly used in financial applications to detect unexpected value distribution and e.g. potential fraudulent transaction data.
The full list of new methods and enhancements with Hana-ML 2.20 is summarized in the changelog for Hana-ml 2.20.240319 as part of the documentation. The key enhancements in this release include
Time series analysis and forecasting methods
Auto-ML configuration and methods enhancements
Exploratory data analysis and visualization enhancements
You can find an examples notebook illustrating the highlighted feature enhancements here 24QRC01_2.20.ipynb.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
7 | |
4 | |
3 | |
2 | |
2 | |
2 | |
2 | |
2 | |
2 | |
1 |