Supply Chain Management Blogs by SAP
Expand your SAP SCM knowledge and stay informed about supply chain management technology and solutions with blog posts by SAP. Follow and stay connected.
cancel
Showing results for 
Search instead for 
Did you mean: 
Ana_dos_Santos
Advisor
Advisor
Hello, I am a dual student at SAP studying International Business Administration and Information Technology.
In this blog, I will happily share with you the data science project I undertook during my practical phase as part of my studies. The practical phase was supervised by the Industry Solution Management for Energy and Utilities Industries - Contact raik.kulinna.

In the world of data analysis, outliers are data points that deviate significantly from the values that are normal in the given data set. Detecting these outliers is crucial for many applications, for example quality control and anomaly identification.

In this blog, we will explore how to leverage SAP's Predictive Analysis Library (PAL) for outlier detection and implement it with SAP’s Core Data Services (CDS) Table functions in an SAP ABAP Managed Database Procedures (AMDP) class. In the end, we will show the outlier results in Jupyter Notebook with Python. The outcome of this blog will demonstrate how data from S/4HANA can be analyzed in real-time using Jupyter Notebook, while executing native algorithms directly in the SAP HANA database. These algorithms are designed to identify implausible values or perform other data science tasks directly on the in-memory database of the S/4HANA system. The following screenshot provides a glimpse of how data scientists can visualize the data.


I have previously described the process of connecting Python/Jupyter with the SAP S/4HANA system in the following blog post: Data Science with SAP S/4HANA – How to connect HANA-ML with Jupyter Notebooks (Python).

Building upon that, I will now delve into the specific aspects of this topic here.

SAP Predictive Analysis Library (PAL):


SAP's Predictive Analysis Library (PAL) is a tool in the SAP HANA ecosystem, offers a large set of algorithms for advanced analytics that is useful especially for data scientists. Within PAL, you'll find a range of algorithms for outlier detection, such as _SYS_AFL.PAL_OUTLIER_DETECTION_FOR_TIME_SERIES and _SYS_AFL.PAL_ANOMALY_DETECTION. These algorithms allow SAP users to identify outliers within their datasets efficiently.
To know more details about SAP´s Predictive Analysis Library (PAL) check out the blog SAP HANA PAL quick start.

ABAP Managed Database Procedures (AMDP) and CDS Table Functions:


In this blog, we will use the PAL algorithms with help of ABAP Managed Database Procedures (AMDP) and expose it to Python/Jupyter Notebook transparently through SAP’s CDS Table functions within SAP. AMDP classes provide an ideal environment for integrating SAP HANA capabilities with custom ABAP code while CDS table functions offer powerful data transformation and processing capabilities. This fusion of technologies enables us to perform advanced analytics and Data science easily in the SAP S/4HANA environment.
For a more detailed exploration of CDS Table functions and its implementation with AMDP feel free to check out the blogs / articles Umsetzung einer CDS-Tabellenfunktion mit AMDP (German) or How to use AMDP Function implementation for a CDS Table Function as a data source in CDS views (English).

In our scenario, we want to detect implausible manually-entered weighing notes from our end-users where for example the weighing unit was mixed up. These anomalies can have significant consequences in the operations of a company such as logistic operations.

Implementing PAL Outlier Detection in AMDP and CDS Table Functions:


Now, I will show and explain the implementation in a AMDP class.
We'll guide you through the process of creating a CDS Table function which includes creating an AMDP class where we can use the PAL algorithms dedicated to outlier detection. Through code snippets and some screenshots, you will see how to apply PAL algorithms within this AMDP class, and you will learn how to leverage the power of CDS Table Functions. You can change the parameters and customize the algorithms to your specific needs.

  • First, create a CDS Table Function using the statement “DEFINE TABLE FUNCTIONS” in the ABAP CDS Data Definition language (DDL):
    	@AccessControl.authorizationCheck: #NOT_REQUIRED
    @EndUserText.label: 'CDS Table function'
    @ClientDependent: false

    define table function zads_tf
    returns{
    // keys
    ordernr : eordernr;
    //atributes
    net_weight : eweight_net;
    weight_unit : weight_unit;
    is_outlier : integer;
    }
    implemented by method zads_cl_data_via_cds=>get_outliers;

    If you require a CDS Table Function with input parameters, you need to include them by using the statement “DEFINE TABLE FUNCTION function_name WITH PARAMETRS() RETURNS()IMPLEMENTED BY METHOD method_name

  • Next, create an AMDP class with the same name and the method you mentioned in the “IMPLEMENTED BY METHOD” part of the CDS Table Function above. The interface “IF_AMDP_MARKER_HDB” marks/flags the class as an AMDP class.
    You will need to declare the method as the implementation of a CDS Table Function with the statement “FOR TABLE FUNCTION” and the CDS Table Function name. This can only be done in the public section of an AMDP class.
    	CLASS zads_cl_data_via_cds DEFINITION
    PUBLIC
    FINAL
    CREATE PUBLIC .

    PUBLIC SECTION.
    interfaces if_amdp_marker_hdb.

    class-methods get_outliers for table function zads_tf.

    PROTECTED SECTION.
    PRIVATE SECTION.
    ENDCLASS.

    CLASS zads_cl_data_via_cds IMPLEMENTATION.

    METHOD get_outliers BY DATABASE FUNCTION FOR HDB LANGUAGE SQLSCRIPT OPTIONS READ-ONLY USING EWA_ORDER_WEIGH.

    *declaring tables required to call the outlier detection method
    declare pal_parameter_table table (param_name varchar(256), int_value integer, double_value double, string_value varchar(100) );
    declare pal_outliers_table table (ordernr integer, net_weight double );
    declare pal_statistic_table table (ordernr integer, cluster_id integer, score double );
    declare pal_centers_table table ( cluster_id integer, net_weight integer);
    declare pal_data_table table ( ordernr integer, net_weight double);

    *getting necessary data to perform outlier detection table EWA_ORDER_WEIGH
    pal_data_table1 = SELECT EWA_ORDER_WEIGH.ORDERNR,
    EWA_ORDER_WEIGH.NET_WEIGHT,
    EWA_ORDER_WEIGH.weight_unit
    FROM EWA_ORDER_WEIGH
    WHERE EWA_ORDER_WEIGH.WEIGHT_UNIT = 'TO' and EWA_ORDER_WEIGH.NET_WEIGHT <> 0;

    pal_data_table = SELECT distinct ordernr, net_weight FROM :pal_data_table1;

    *calling outlier detection method
    call _sys_afl.pal_anomaly_detection(:pal_data_table, :pal_parameter_table,:pal_outliers_table,:pal_statistic_table, :pal_centers_table );

    pal_outliers_result = SELECT distinct p.ordernr, p.net_weight, e.weight_unit, CASE WHEN p.ordernr IS NOT NULL THEN 1 ELSE 0 END AS is_anomaly from :pal_outliers_table as p, ewa_order_weigh as e where e.weight_unit = 'TO';

    *returning outlier result and adding a column named “is_outlier” where the number 1 means the data point is an outlier and the numbe 0 means the data point is normal.
    RETURN SELECT distinct e.ordernr, e.net_weight, e.weight_unit,
    CASE WHEN outliers.is_anomaly = 1 THEN 1 ELSE 0 END AS is_outlier
    from :pal_data_table1 as e left join :pal_outliers_result as outliers on e.ordernr = outliers.ordernr and e.net_weight = outliers.net_weight
    order by is_outlier desc;
    ENDMETHOD.
    ENDCLASS

    I decided to also perform outlier detection in the implementation of the method “get_outliers” by calling the method “_SYS_AFL.PAL_ANOMALY_DETECTION”.
    The method requires input and output tables as parameters which I declared using the statement “DECLARE table_name TABLE(<column_name1> DATATYPE, <column_name2> DATATYPE, …)”. The table “pal_data_table” is the input table with the data selected from a table of an SAP System. For more knowledge about the method “_SYS_AFL.PAL_ANOMALY_DETECTION” please check out the documentation Anomaly Detection with PAL.

    At the end of the method “get_outliers” I use the “RETURN” statement to return exporting parameters. The parameters in this method are determined by the parameters of the CDS Table function.

  • After creating the AMDP class, you need to create a CDS View in the ABAP CDS Data Definition language (DDL) that uses the CDS Table function we created as a data source.
    	@AbapCatalog.viewEnhancementCategory: [#NONE]
    @AccessControl.authorizationCheck: #NOT_REQUIRED
    @EndUserText.label: 'CDS view for table function'
    @Metadata.ignorePropagatedAnnotations: true
    @ObjectModel.usageType:{
    serviceQuality: #X,
    sizeCategory: #S,
    dataClass: #MIXED
    }
    define view entity zads_dataCDSview as select from zads_tf
    {
    ordernr as ordernr,

    @Semantics.quantity.unitOfMeasure: 'weight_unit'
    net_weight as net_weight,
    weight_unit as weight_unit,
    is_outlier as is_outlier
    }



At this point we have created all necessary repository objects to use CDS Table Functions as a data source, and we can display the outliers from the output of the method get_outliers in Python through the CDS View zads_dataCDSview.

As mentioned in the previous Blog about the connection between Python and S/4HANA (Data Science with SAP S/4HANA – How to connect HANA-ML with Jupyter Notebooks (Python)), we need to create an SAP Core Data Service (CDS) View Entity, a service definition, and a service binding. Because we have already created a CDS View Entity, we only need to create a service definition and a service binding, defining that the CD View Entity we created is exposed in a SQL Service.

  1. To create a service binding right click the CDS View we just created and choose New Service Definition:

  2.  Enter the following parameters and click Next:

  3. Choose a transport request, click Finished and activate the Service Definition.

  4. To create a Service Binding for the Service Definition we just created, right click the Service Definition, and choose New Service Binding:

  5. Enter the following parameters:

  6. Choose a transport request, click Finished and activate the Service Binding.


Now that you have created the necessary objects to help you retrieve the data through the connection, we can use the Service Binding and CDS View Entity in the Structured Query Language (SQL) Statement where we will show the output from the method “get_outliers” as a result.

Before we get to the result, I would like to give you a better understanding if the data distribution of my data set with some Python data visualization tools.
I this case, I used boxplot as a way of showing you the data.
#BOXPLOT 
boxplot = dffr.boxplot(column='net_weight',figsize = (7,5))
boxplot.set_ylabel('Weight in TO')
boxplot.set_xlabel('')

We get the following boxplot as a result:

In this boxplot we can see 3 data points that might be considered outliers: the net weight 2000 TO and two other data points near 100 TO. In the table with the outlier results from the method “get_outliers”, we will be able to see if the algorithm/method predictions match the boxplot.


Now, let´s look at the outliers with Python by selecting the data from the CDS Table Functions and CDS View Entity we created.
#Selecting data from CDS-View related to the table function we created in the ABAP environment
dfr = cc.sql('SELECT * FROM ZADS_SB_TF.zads_dataCDSview' )
dfr.collect()

We get the following table as a result:


The Number 1 in the column “is_outlier” means the data point is considered an outlier.

Here is a list of all the outliers detected in the method “get_outliers”:
dfr1 = cc.sql('SELECT * FROM ZADS_SB_TF.zads_dataCDSview where is_outlier = \'1\'' )
dfr1.collect()


We can see that the possible outliers shown in the boxplot are detected by the method / algorithm "_SYS_AFL.PAL_ANOMALY_DETECTION" in the method “get_outliers”.

In a previous Blog titled Detecting Implausible User-Entered Weighing Notes using Data Science with an SAP S/4HANA (On-Premise..., I performed the outlier detection in the same data set by using Python tools instead of doing the outlier detection with SAP's Predictive Analysis Library (PAL). The boxplot looks the same in both scenarios but the outlier result with SAP´s PAL differs from the result in Python (with both outlier detection methods) because the algorithms work differently.

Summary


In this blog, I have demonstrated that SAP PAL, SAP CDS Table Functions, and SAP AMDP provide a great toolset for detecting outliers and doing data science directly with your SAP S/4HANA data. It helps you discover valuable insights with in-memory speed that can provide informed decision-making and other valuable data science. By following the steps outlined in this blog post, you can harness the full potential of these tools in your SAP environment, and you can use external tools such as Python libraries to work with the data, for example for data visualization.
2 Comments