In my last part I described how I started collecting data from FRUN. Now the next question is:
Which metrics will I use for evaluating the performance of a SAP system?
I don't have very much time to check the FRUN metrics out individually, and the productive FRUN system is already under heavy load, so I cannot easily activate new metrics for monitoring, so I have to stick with what is already available.
In table
MAI_UDM_STORE there is a column
CATEGORY='PERFORM'. I limit my selection further on only productive systems, and ABAP stacks. This leaves me with 26 so called TYPES, which are the performance metrics I will use. May the machine learning model later decide which of these are relevant for predicting the overall performance and which ones are not:
- ABAP_INST_ASR_DIA_TOTAL
- ABAP_INST_ASR_HTTPS_TOTAL
- ABAP_INST_ASR_RFC_TOTAL
- ABAP_INST_ASR_UPD_TOTAL
- ABAP_INST_BTC_QUEUE_UTILIZATION
- ABAP_INST_DIALOG_LONGRUNNING
- ABAP_INST_DIALOG_RESPONSE_TIME_EVENT
- ABAP_INST_ICM_CONN_USAGE
- ABAP_INST_ICM_THREAD_USAGE
- ABAP_INST_MEMORY_EM_USED
- ABAP_INST_MEMORY_PAGING_AREA_USED
- ABAP_INST_UPDATE_LONGRUNNING
- ABAP_SYS_TRANSACTION_RESPONSETIME
- ABAP_UPDATE1_RESPONSE_TIME
- ABAP_UPDATE2_RESPONSE_TIME
- BATCH_RESOURCES
- DIALOG_FRONTEND_RESPONSE_TIME
- DIALOG_QUEUE_TIME
- DIALOG_RESOURCES
- DIALOG_RESPONSE_TIME
- ICM_RESOURCES
- NUMBER_OF_FREE_DIALOG_WORK_PRO
- SYSTEM_PERFORMANCE
- UPDATE2_QUEUE_UTILIZATION
- UPDATE_RESPONSE_TIME
- UPDATE_RESSOURCES
This brings me to another important point. How do I
determine if the performance is actually good or bad? This is a very important question, because I do not have the time to manually evaluate all available data to identify and label the cases of actual bad performance. I want to automate this and generate lots of labeled data for machine learning.
In this case, I simply use the standard evaluation from SAP FRUN for these metrics. Each of these 26 metrics gets a simple rating of OK, WARNING or CRITICAL, depending on preconfigured thresholds. While this standard rating from SAP FRUN might not be ideal, it is a valid starting point and a huge time saver.
I simply define at each point in time the
"performance health" as how the 26 metrics got rated. For a rating of OK there are 2 points awarded, for WARNING just 1 point and for CRITICAL I use 0 points. Then that sum is normalized to a value between 0 to 100%.
If all performance evaluations are OK, then the performance health rating is 100% (= best case). If all performance evaluations are CRITICAL then the health rating is 0% (= worst case scenario). This can be easily calculated with a simple SQL statement from the database. In a next step, I can identify the incidents where SAP systems encountered a performance incident, and even get some idea about how long the problem persisted and how high the impact was.
In a way, I have now something I could call
"Anomaly Detection". In my big database, I can now identify incidents showing an abnormal bad SAP performance. This will be the basis for the next steps in the series, where I tackle
"Anomaly Prediction".