Segmentation of the Human Resources absence hours per disease and year's season with SAP HANA Machine Learning
How many hours of absence per case occur more frequently and what is the disease? Approach to treat the highest number of short-time absences with the prevention of diseases that causes them.
Introduction
This article is based on the book from SAP PRESS “Data Segmentation Using K-Means and SAP HANA PAL” Link to book: https://www.sap-press.com/data-segmentation-using-k-means-and-sap-hana-pal_5153/#utm_source=2003&utm...
Absent hours traditional analysis could oversee just the top five or ten disease causes. But we could have several segments of absent hours. We could have a scenario where several cases of just one or two absent hours could sum a big number of total hours due to specific diseases. Another scenario is having few cases but with a big number of absent hours. Our challenge is to research the spread of causes for absent hours to focus on specific diseases for special cases. For example, we can apply for preventive medicine and scheduling medical consultation off-work hours.
For this analysis, we can use the SAP Hana Predictive Analysis Library applied for statistical grouping using Machine Learning. With this SAP Hana function, we can find the different segments of absent hours cases and months for seasonal diseases.
Grouping statistics takes a huge amount of data points and finds the segments that share similar values. Also gives the information of the ideal number of groups. In figure 1 we have absent hours where is difficult to find the segments of the data.
After we run the function, the group segments for the absent hours, the range hours for each segment, and the number of cases per segment. See figures 2,3 and 4.
We want to identify the absent number of days segments per case. For example, we could have a segment of a two-hour absent group of cases.
The sum of all these absent hours could reach 50 hours per month. The cause of this group could be mental behavior. We can apply a preemptive mental health strategy and low these two-hour absence cases. This approach could be faster and cheaper than longer-term diseases like circulatory diseases.
Conventional analysis chart shows absence number of hours per month indicating the cause disease like Figure 1.
We can note the major diseases and months in this chart:
1) Infectious and parasitic for months three, four, and six.
2) Parasitic in blood for month three.
3) Circulatory for month twelve.
4) Nervous system for month seven.
Then we have several diseases with less than fifty hours of absence per month.

Figure 1: Absent hours per disease and months
Applying the group's statistics function of SAP HANA Predictive Analysis library to the set of 718 records of data, the function returns:
1) The optimal number of group classification for the number of absent hours is five. See figure 2.
2) The total number of absent hours per group. See figure 2.
3) Number of cases per group. See figure 3.
4) Minimum and the maximum number of absent hours per group. See figure 4.

Figure 2: Number of Absent hours per group.

Figure 3: Number of cases for each group.
From the SAP HANA predictive analysis library using the statistics grouping function, we can conclude these facts that are not clear from conventional charts like Figure 1, where we just see the number of absent hours per disease.
1) The groups with more hours and cases are one and five. See figures 2 and 3.
2) The groups with more absent accumulated hours and cases are one and five. However, the range of absent hours per case in group one is just between zero and five. The range for group five is seven and sixteen absence hours. See figure 4. That means that there are many single cases in group one (450 cases) and group five(210 cases), in which absent hours are between zero and sixteen per case. There is a low range comparing with group four that is between 104 and 120 absent hours. We will see those cases below.

Figure 4: Minimum and Maximum absent hours for each group.
Figure 5 shows us the top 10 diseases that create a greater number of cases and absent hours. These are groups one and five. Even that the range of absent hours are between 0 and 16, these diseases create that several employees ask for permission to absent the work. Example of these causes are:
1) Blood donation
2) Ears
3) infections
4) medical consultation
Doing better scheduling for Blook donation and medical consultation could improve the number of absent hours and increase productivity.

Figure 5: Top 10 diseases that provoke more cases quantity.
On the other hand, we see that the highest ranges of absent hours are in groups three and four. See figure 4. These groups also have fewer cases of absent hours demand. See figure 3. That means that these cases are concentrated on few people.
The top 10 diseases in Figure 6 for these cases shows these causes for absent hours:
1) Circulatory system
2) Blood and Circulatory system infectious

Figure 6: Top 10 diseases for the lowest number of cases but the highest range of absent hours.
This information shows that few employees are having a long-term disease that demands several absent hours.
SAP Hana Predictive Analysis Library
For this analysis, we can use the SAP Hana Predictive Analysis Library applied for statistical grouping using Machine Learning. With this SAP Hana function, we can find the different segments of quantity and weights that we have in the deliveries of our products to clients and months for peak demand and seasonal forecast.
Grouping statistics takes a huge amount of data points and find the segments that share similar values. Also gives the information of the ideal number of groups. In figure XXX we have several quantities for products where is difficult to determine the segments of the data.
After we run the function, the group segments for the data are in Figures 2,3, and 4.
The SAP Hana Predictive Analysis Library structure is in Figure XX. The knowledge prerequisite to run it is basic statistics (median, standard deviation), Query Language, and data cleaning for Big Data.
The functional requirements are:
1) Create tables for input parameters. Example: table with the data source and the number of groups we want for segmentation.
2) Tables to store the results from the function. The results are:
-The segment assigned for every record of the input data.
-The average point or Centroid for every segment.
-The distance to the average point or Centroid of every record. The ideal case that every point has the closest distance to its average point.

Figure 7. SAP HANA grouping function structure
Conclusion:
We have a cluster with the highest number of absent hours, low range of hours (between 0 and 16) but with a big number of cases. The diseases cause for these absent hours are:
-infectious
-nervous system
-eyes infectious
and many medical consultations and blood donations.
So, we can lower the absent hours by better scheduling the medical consultation and blood donation. Also, by applying preventive medicine for infections.