More and more organizations get the value of advanced analytics. Advanced analytics is all about bringing live intelligence to enable key decision makers to act with confidence that has better business outcomes. So, what is advanced analytics, according to
Gartner
“Advanced Analytics is the autonomous or semi-autonomous examination of data or content using sophisticated techniques and tools, typically beyond those of traditional business intelligence (BI), to discover deeper insights, make predictions, or generate recommendations. Advanced analytic techniques include those such as data/text mining, machine learning, pattern matching, forecasting, visualization, semantic analysis, sentiment analysis, network, and cluster analysis, multivariate statistics, graph analysis, simulation, complex event processing, neural networks.”
Well, why you need advanced analytics.
Let’s take commercial aircraft which we all familiar with it, and we had an arsenal of horror stories of being trapped on the tarmac due to maintenance or operational issue. Air carrier loses huge money when aircraft grounded for maintenance or operational issues. “Predict before they occur” is a common theme and every airline is striving to achieve this. This use case requires advanced analytics and most advanced analytics use case requires multiple technologies such as predictive & machine learning, spatial, text & search, graph, streaming, time series, and document store. SAP HANA Advanced Analytics does provide all those capabilities.
So, what is new in Predictive & Machine Learning.
Python API for Machine Learning for data scientists
- For Python users, allow them to use a Python ML library to create a dataframe and use algorithms for machine learning. Customers would work in a Python environment, and the processing would happen in HANA. This approach provides a Python interface to select algorithms in PAL and APL.
- Increasingly data scientists prefer to code in Python for machine learning. Providing a Python API would enable HANA to appeal to these data scientists
R API for Machine Learning
- For data scientists working in R, this package allows them to work in RStudio and utilize machine learning functions provided for creating their models and generate R dataframes.
- Data scientists that prefer to use R, would like to do machine learning without bringing the data back to the client side and doing the processing there. This approach allows them to work with large data sets where the execution is happening in the server, thereby enabling excellent performance.
New “Hybrid Gradient Boosting Tree” regression and classification
- Gradient boosting tree (GBT) is popular ensemble machine learning technique for regression and classification problems, for which we provide an enhanced implementation in PAL Gradient boosting is a very popular algorithm, known for its performance and model quality.
- Gradient Boosting Tree models ensemble multiple trees into a robust composite model focused on minimizing prediction errors
TensorFlow Cloud environment and model training integration enhancements
- Support for new CLOUDHOOK protocol allows HANA to use TensorFlow environment for model training and scoring, with the trained models being kept in HANA.
- Providing a complete solution (training and scoring) using TensorFlow completes this feature, as we were only supporting scoring previously. Also, it provides flexibility in where the model is run, as the trained model can be now kept in HANA and used for scoring.
Automated Predictive Library is now part of SPS 04 allows data scientists, developers, and business analyst to automate the machine learning
What’s new in Streaming Analytics:
- R Data Service for Continuous Computation Language (CCL) allows R machine learning script called directly from streaming projects.
- Support for LEARN BY Clause in Denstream clustering for larger #’s of objects with unique behavior patterns
- Graphical CCL Editor in Web IDE In addition to the existing Text Editor and Streaming Runtime Tooling, HANA streaming analytics now provides a Graphical Editor
- Integration with SAP Data Hub enables HANA streaming analytics projects to be built, deployed and run as part of Data Pipelines within SAP Data Hub
With all new innovations of HANA 2.0 and SPS04, it is time to upgrade to SPS04 to simplify the journey of an intelligent enterprise.