Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
Showing results for 
Search instead for 
Did you mean: 
How can SAP BTP (Business Technology Platform) together with modern sensors derive value through people behavioral analytics? 

In this post, which is the first of a series of posts on the topic, we will see how Data Science delivered through SAP BTP together with the latest in logging and localization tech, can enable the “Connected Workforce” of the future, using a real-world example from a large industry, a months-long dataset of 200 workers, SAP Analytics Cloud and HANA-Machine Learning using Python and the Predictive Analytics Library. The following fundamental questions will be addressed:






As we shall argue, the "connected workforce" concept us providing business value in multiple ways: not only through worker-level behavioral analytics augmenting the existing human-resources view, but also by enhancing resilience and security (against unauthorized access, overstay, abnormal behaviors, and virus transmission), and also enabling further optimization of roles, routes, spatial arrangements, and sequences. This series is based on a real-world example delivered to a world-class manufacturing company. We will start with the overall business view, and then proceed with articles covering more technical sides, describing algorithm design as well as giving code examples, alongside with valuable tips that make scalable implementation possible, all the way up to large plants with thousands of workers. Let us start addressing the fundamental questions, in this first post of the series!



There exist many possible industries, in which physical plants of medium-to-large dimensions, with interior as well as exterior spaces, are the norm: Refineries, Steel Plants, Ports and Airports, even large Shopping Malls, among others. Usually, such plants, have hundreds to thousands of people entering and exiting them, as well as moving in complex patterns through them, alone or in groups; many on a daily basis, some of them more irregularly, some even only once. And these people could be either employees/workers, or clients and visitors of many different sorts; and of course, each one of these, could be further classified in various roles and categories.

Modern Physical Plants can have thousands of workers moving through them

(Source: File:Tata Steel Jamshedpur plant.jpg - Wikimedia Commons)

Now imagine that we start equipping these physical plants with various types of localization devices: from simple check-in/check-out systems for entering and exiting the plant, to RFID (Radio Frequency Identification)- or computer-vision-based location trackers; and many more such technologies now start to become cheap and widely available, often also preserving appropriate levels of privacy and anonymity, if required. The question thus arises: How can one utilize such technologies together with data science and the best that SAP P&T (Platform & Technologies) has to offer, towards not only innovation for the sake of innovation, but rather towards deriving true Value?

MiNew i10 BLE (Bluetooth Low-Energy) Beacon

(source: File:I10 Indoor Location Beacon.png - Wikimedia Commons)

Sewn-in RFID in Garment by Decathlon

(source File:RFID tag textile front-through-back.png - Wikimedia Commons)

Beyond single-person face tracking, multi-person tracking of full-body "skeletal points" is easy nowadays with open source products like OpenPose

(source: File:Visage Technologies Face Tracking and Analysis.png - Wikimedia Commons)



But what makes this setting interesting, and also, what makes it difficult? Well, a first set of points that we discovered were, starting from the interesting ones that could touch upon value:

- Personalized analytics of worker entry/exit times and places, shift types (day/night etc), and stay durations, as well as periodic and long-term trends, for individual workers or specific groups

- Overstay detection (also increasing safety)

- Detection of entry into forbidden or role-irrelevant zones, for every specific worker

- Detection of over-lengthy transit times across zones or stays in non-job related zones

- Detection of “abnormal” patterns that might require investigation: gates or zones that a specific worker has never before used or very rarely does, sequences of movements from zone-to-zone that are novel or rare, time durations of stay in zones that are outside the statistically expected ranges.

- Classification of different types of activities of workers on the basis of their movement patterns

- Assessment of the similarity of worker behavior across various dimensions, allowing also the clustering of workers into different groups on the basis of behavioral patterns

- Logging patterns of close contact between workers that might be useful for example for COVID-19 contact tracing; or for detecting co-working and communication subgroups

The above data, when furthermore combined and juxtaposed with HR(Human Resources)-data, such as job role descriptions, seniority, demographics, and more, can provide even richer insights, into what is happening (descriptive), why it might be happening (explanatory), as well as what might happen (predictive), and how desirable states of affairs can be reached conditional to actions taken by the management (prescriptive level), for example through policy changes or changes of operating parameters and re-allocations of resources.

Quite importantly, beyond short- to mid-term decisions for action, and even real-time in some cases, data science can also illuminate operational optimizations, all the way down to the spatial re-distribution of functions across the plant buildings and outdoor areas, as well as re-arrangement of role types and role responsibilities/activities at the worker or the worker group level.



Beyond this non-exhaustive open opportunities for deriving value , what makes the above not only interesting, but also challenging? Some factors include:

- Large percentages of missing and/or erroneous entry-exit data; for example, in our specific example, it was not untypical for more than 10-20% of all entries/exits to the plant to not have been logged. Thus, if one relies on the data alone, many workers appeared to be staying for shifts of more than 24 or even 48 hours, given that some entries or exits were not logged.

Example of a missing “GATE OUT” event

(Source of figure: personal)


- Large percentages of irregularly time-sampled, missing and/or erroneous indoors or outdoors (inside the plant) localization data; for example, in our specific example, it was not untypical to have cases where localization data existed every minute, but others were the sampling rate fell to above 10 minutes, and others where large breaks of duration of more than one hour existed.

Example of irregular sampling with large time gaps

(Source of figure: personal)


- Small number of areas covered with localization; and total effective area of plant covered might be small too. For example, in our case, beacons with radius of operation of 10 or so meters were installed, but the total area covered from the plant was less than 5%; and furthermore, in many cases, the radii of coverage of consecutive beacons were not overlapping, and thus when one worker’s trace “disappeared” there might be many possible beacons where he might “re-appear” again. That is, multiple blind spots, and many disconnected regions of visibility, with a complex graph of region-to-region-through-blindspot connectivity.

Example of inadequate coverage of plant by beacons

(Source of figure: personal)


- Although areas with localization might exist and might even increase in number over time (for example, through installation of new beacons), the exact coordinates of the beacons might not be available; in our case, while hundreds of beacons existed, only 30 specific longitude/latitude points were assigned to them (a very gross set of centers; creating even smaller coverage if one wanted to rely on the metric cartesian geometry and not just on sequences of beacon ID’s (Identities)).

- Also, the maps of the physical layout of the plant (outline and features of buildings, also sometimes including multiple levels and/or floors and extending to the third dimension), might either not be available and/or extensive or accurate, and most importantly, they might need extensive processing in order to be transformed to a usually and digestible format.



Or, in simple words, given the above non-exhaustive list of potential problems, how can then the “connected workforce” systems be possible, so that we can derive the value described that makes them desirable? The answer lies in the power derived by SAP’s P&T: SAP HANA cannot only handle the potentially very large amounts of data required, but through SAP HANA ML, and for example PAL (Predictive Analytics Library), the latest state-of-the-art AI (Artificial Intelligence) algorithms can help us deal with the above problems, even for quite severe cases, as we shall see in our real-world implemented example. And thus, the ever-increasing availability of various worker localization solutions, ranging from beacons to RFIDs to logging cards, bluetooth, computer vision and beyond, can be harnessed and provide an ever-widening range of benefits, ultimately providing substantial value.

Screenshot from one of the pages of our dashboard with removed names

(Source of figure: personal)



Thus, in this introductory blog post, we started by introducing the typical setting of the “Connected Workforce” of the future, as well as the underlying enabling technologies. Then, we covered and enumerated a number of concrete avenues towards deriving business value; and also some of the things that make the problem especially challenging, but also sketched the main elements of our solution that has already been implemented for a customer.

In the next posts in this series we will see, in more detail, our case study, examining the data set and data quality, and then analyzing the implementation of a continuously-extensible solution using SAP HANA ML and SAP Analytics Cloud. In particular, in the second post of this series, anonymized sample datasets will be presented, followed by a description of algorithms (starting with the overstay detection algorithm). In the third post, we ask the question: “Can data modeling be enhanced by incorporating business knowledge?” (link to the post is here) and provide details of the underlying SAP HANA model as well as a very important implementation trick that makes scaling possible and provides also further benefits. In the fourth post, multiple levels and types of behavioral similarity assessments across workers will be provided, as well as the underlying code, and ways to cluster workers according to similarity will be presented. In subsequent posts, the design of an interactive dashboard with many views and zoomable detail levels will be presented, as well as more information on the abnormality detection capabilities, together with more details on the extensions roadmap towards the “Connected Workforce” of the future, powered through SAP BTP.