Key Take Aways
Integrating machine learning capabilities within a database reduces data movement, enhances security, simplifies governance, and broadens accessibility to machine learning for users of all levels, addressing the shortage of skilled data scientists.
Machine learning in SAP HANA Cloud covers key scenarios like classification, regression, and time series forecasting. With libraries like Predictive Analysis Library (PAL) and Automated Predictive Library (APL), it caters to both experts and non-experts, allowing data scientists and developers to collaborate effectively to build intelligent data applications.
Introduction
Artificial Intelligence and Machine Learning rank in the top technology trends in 2023.
[1] This blogpost dives deeper into how SAP HANA Cloud uses these features to enable advanced analysis within Intelligent Data Applications.
Please note that this blogpost includes a
7-minute interview with SAP HANA Cloud Machine Learning Expert
christoph.morgen, who shares valuable insights into the topics of Artificial Intelligence, Machine Learning, and SAP HANA Cloud.
Interview with Christoph Morgen on Machine Learning powering Intelligent Data Applications.
Artificial Intelligence & Machine Learning
What is the difference between Artificial Intelligence (AI) and Machine Learning (ML)? The first refers to a machine’s capacity to execute tasks that have traditionally relied on human intelligence. Within the realm of AI, Machine Learning is a subtopic. ML models acquire insight through extensive training on numerous example data points. ML algorithms identify patterns in order to make predictions and recommendations. Furthermore, these algorithms exhibit adaptability and can enhance their effectiveness when confronted with fresh data and experiences.
[2]
Empowering Insights: The Significance of Built-In Machine Learning in Databases
Data is valuable but only if one can gain the right insights from the analysis. Therefore, machine learning capabilities that are built into a database are an important advantage in various aspects.
As with SAP HANA Cloud’s embedded ML, data movement is significantly reduced since the data exists already inside the database. Use of local data allows training models on massive sets of data and assurance that the system will scale.
[3] It also reduces complexity as external ML engines normally need to be installed, configured, and managed.
[4]
Often when talking about Machine Learning, security concerns arise. By using a built-in engine, access is directly controlled by the database security practices.
[3] Security and governance are tightly connected with the built-in Machine Learning capabilities. User management can be done using simple SQL queries in the same way one would grant table access. On top, the model governance becomes simpler compared to using external Machine Learning. SQL statements or multi-model engines, like a graph engine, can be utilized to check the model performance.
Moreover, the use of a common interface drives adoption among all levels of users. While specific problems continue requiring data scientists, basic tasks can be solved by using SQL – opening the world of ML without the need of becoming an R or Python expert. This is a very important benefit, keeping in mind that skilled data scientists are rare and expensive.
[3]
Common Machine Learning Use Cases
The three most frequently used scenarios within ML are classification, regression, and time series forecasting.
Classification describes the ability of an algorithm to predict a class for a given data point (example). This could be e.g., classifying whether an email is spam or not, or whether recent customer behavior should be seen as churn or not. Both examples are binary classifications, but classification problems are not always binary. They can also be multi-class e.g., classifying whether a handwritten character is already known. Multi-labels come into play, when more than one label can be predicted for each example e.g., photo classification. Here the ML algorithm predicts the presence of specific objects like a tree, a dog, and a house. These can also be imbalanced, meaning that the number of examples per class are distributed unequally. Classical examples are fraud detection or medical diagnostic tests.
[5]
Binary Classification
Multi-Class Classification
Photo Classification [6]
Regression describes the ability of an algorithm to predict a continuous target variable versus a discrete one. For example, predicting a person’s salary based on factors like education, work experience, seniority, and geographical location.
[6]
Therefore, regression establishes a relationship between variables by estimating how one affects the other. Linear regression, which means finding the best-fit line between the given data, is by far the most popular form as it is easy to use for predictions and forecasting.
[6]
The last scenario is time series forecasting. These kinds of algorithms predict future data points using data from a historical time-series. This scenario comes with its very own challenges, that are tightly connected to the time aspect: Seasonality, changing trends, holiday effects, and data sparsity. A great example is the need for demand forecasting and planning of retail space. Here, time series forecasting can be used to predict inventory for high demand while avoiding overstocking in phases of lower demand.
[7]
Machine Learning in SAP HANA Cloud
In the area of Machine Learning, SAP HANA Cloud offers two libraries with numerous algorithms. The Predictive Analysis Library (PAL) caters for all classical ML scenarios: classification, regression, and time series forecasting used by experts. The Automated Predictive Library offers a simpler syntax suitable for non-experts. Both libraries are accessible via SQL or SQLScript.
[8]
To leverage synergies, the SAP HANA Cloud Machine Learning client enables the data scientist to still work in the preferred environment (Python, R…) while allowing code sharing in a convenient way. The data scientist can build the ML scenario using their expertise and deliver the objects to the application developer. The developer can then access the ML object within the actual application code using SQL, and thus, infuse intelligence into the data applications.
Machine learning in SAP HANA Cloud can for example be used to predict employee churn. ML expert
christoph.morgen demonstrates how to build an ML model to predict which employees are most likely to leave the company. In the
demo our SAP HANA expert uses a classification algorithm, trains the model, and evaluates the result using a test data set. The last step the model predicts employee churn. Hence SAP HANA Cloud’s Machine Learning capabilities enable users to form a prediction without the need for expert data science knowledge.
Demo Machine Learning in SAP HANA Cloud.
Example of an Intelligent Data Application using Machine Learning
A company runs a large production site that utilizes sensors to stream equipment health data into an SAP HANA Cloud database. The customer recognizes that unforeseen downtimes of their production equipment costs money and impacts customer satisfaction. An Intelligent Data Applications is necessary. Sensor data is fed into SAP HANA Cloud while the built-in ML algorithms predict potential downtime using real-time sensor readings along with historical data. Now the company can predict malfunctions and act proactively instead of experiencing the unfortunate “fire drill.” This insight into potential downtime not only increases production output but also keeps “everyone” happy.
Summary
The use of SAP HANA Cloud to provide Machine Learning for Intelligent Data Applications remains a prominent technology strategy in 2023. Data scientists and application builders benefit from reduced data movement, improved governance, and a common programming interface. SAP HANA Cloud's ML capabilities can play a vital role in use cases like predicting future demand and proactively maintaining production equipment. Finally, every business process benefits when applications are seamlessly infused with the right actionable intelligence.
Useful resources to start your journey today
[1] https://www.simplilearn.com/top-technology-trends-and-jobs-article#4_artificial_intelligence_ai_and_...
[2] the-economic-potential-of-generative-ai-the-next-productivity-frontier-vf p. 6 & 7
[3] https://www.architectureandgovernance.com/artificial-intelligence/five-reasons-why-in-database-machi...
[4] https://blogs.oracle.com/machinelearning/post/top-10-reasons-to-use-machine-learning-in-oracle-datab...
[5] https://machinelearningmastery.com/types-of-classification-in-machine-learning/
[6] https://www.datacamp.com/blog/classification-machine-learning
[7] https://cloud.google.com/learn/what-is-time-series#:~:text=Time%2Dseries%20forecasting%20is%20a,pred...
[8] https://help.sap.com/doc/eef71122810d4aa18d8eb6c37031f98a/hanacloud/en-US/Feature_Scope_Description_...