Part 1: Introduction, data model & structures
Dear analysts working hands-on with SAP Signavio Process Intelligence: This is for you!
Whilst quite some blogposts on approaches and data integration already exist thanks to my fellow colleagues - I thought it would be time to share hands-on experience & good practices from projects on a topic that is often underrated but so important to learn when trying to truly leverage value through Process Intelligence:
SiGNAL.
This hands-on guide is dedicated to you specifically, summarizing and sharing key learnings & good practices over the past years working as a Process Intelligence Consultant with SiGNAL – supporting you to embark on your analysis journey as smooth and fast as possible.
SiGNAL abbreviates to
"Signavio Analytical Language" and is crucial to answer many of the questions that may come up when working in a process analysis. It is a language needed to create meaningful advanced widgets.
In order to leverage the best out of our analytical components, it is highly recommended to understand and learn this language which is meant to be similar to SQL but offers functions explicitly dedicated to process data queries.
It is being used in various sections and of key importance in SAP Signavio Process Intelligence. One of the core modules from SAP Signavio.
Picture 1: SAP Signavio Process Intelligence
I am not going to cover everything but will try to explain basic topics and good practices that will ease up your entry to working with SiGNAL.
Why not everything?
- My fellow and hard working colleagues have created a very big compendium of functionalities (and are most probably creating some further ones in exactly this moment) which makes it hard for a full guide to stay current. This guide is meant to share hands-on experience and good practices, not to introduce into all functionalities.
- There already exists a profound technical documentation for all functionalities and principles, this guide is meant as a summary and to communicate best practices, not to replace technical documentation.
You can find the complete technical documentation here:
https://documentation.signavio.com/suite/en-us/Content/process-intelligence/signal.htm
But now, let’s start to embark into our journey of understanding SiGNAL.
Why should you use SiGNAL?
Complex questions require (not always, but often) complex queries and query flexibility. I personally mostly use SiGNAL code in my analyses as it embraces great flexibility and allows me to precise my queries.
How to use it?
SiGNAL is quite similar to SQL (SELECT <fields> FROM <table> WHERE <conditions>). Yet, there are quite some differences. In this guide, we will have a look into these differences and you will see why this language is a perfect fit for process data querying!
What are the differences?
- Nested data structure: The data structure in SAP Signavio Process Intelligence is a little different than in classical database tables. The data is nested. This has implications on how to query process data (we‘ll look into this).
- No Joins: You currently query on all the data you have available.
- Additional functions that are designed specifically for process data querying (e.g. MATCHES & BEHAVIOUR).
Nested data structure:
Let’s have a look into the nested data structure, what does it mean and what are the implications?
You can think of it as a sort of a “table-in-a-table” concept. All event-related information is stored in the same table as the case-related information, just on another “level”.
Let’s imagine, all events within a case are being compressed into one database entry:
We are looking at 1 case with 3 events, creating 3 entries in the nested structure and 1 entry in the overall case table. If you query for the number of cases in the overall case table, you will receive the answer “1”. If a second case with 2 events is being added, you will receive the answer “2”.
This then looks like the following (event info nested into each case row):
Picture 2: Nested data model
The query to receive the above data model as an output would look like the following:
SELECT
case_id, “Company Code”, Country, “Supplier No.”,
event_name, end_time, “User Type”
FROM THIS_PROCESS
If you want to count the number of cases, you would do it this way:
SELECT count(case_id) FROM THIS_PROCESS
Bear in mind that, exactly as in SQL, when querying for table names with spaces in their names you will need to use double quotes – when querying for values, you will need to use single quotes (not for numbers).
Important: It is really important to know about and understand this nested vs. flattened structure. You will 100% need this later.
You can also switch the data structure to a so-called “flattened” structure, this would then flatten out all nested structures and create 1 line for every nested row that you’ve queried.
Instead of 2 rows in our nested data structure from above, this would then lead to 5 rows:
Picture 3: Flattened data model
This can be done by using the
FLATTEN()-Function:
SELECT
case_id, “Company Code”, Country, “Supplier No.”,
event_name, end_time, “User Type”
FROM FLATTEN(THIS_PROCESS)
Important to know: If you perform a count(case_id) on a flattened process, e.g. like this:
SELECT count(case_id) FROM FLATTEN(THIS_PROCESS)
This would now lead to: 5 instead of 2!
This is due to the
FLATTEN()-Function. In order to count the number of different Case ID’s you will now need to use the distinct function:
SELECT count(distinct case_id) FROM FLATTEN(THIS_PROCESS)
This will now lead to the correct number of distinct Case ID's: 2.
Remember: This structure has an implication on how to calculate values within your analysis, it makes sense to always ask yourself: Am I looking for a case or event attribute?
If you are looking for a case attribute (or information that can be derived from the case level) only, it might make sense to stay on case level.
If you are looking for a specific event attribute, it might make sense to consider using the FLATTEN()-function.
(It gets a little more complicated when you want to look to combine information from both levels, we will look into this later.)
Also important to remember:
Case attributes only exist one time per case (e.g. City = Berlin) within the nested structure, while multiple characteristics / values within event attributes may exist (e.g. event name, payment method,..).
Why is this now a perfect fit for e.g. process data querying?
With SiGNAL, functions like
MATCHES and
BEHAVIOUR have been introduced.
I would like to introduce you into these two basic functions now as they deliver huge analysis value and are important for Process Mining.
MATCHES:
The
MATCHES-Function has been engineered specifically for process data querying. It allows you to create queries that compare process behaviour.
E.g. if we are interested in analyzing all cases where an invoice has been posted and cleared:
SELECT count(case_id) FROM THIS_PROCESS
WHERE event_name MATCHES (‘Post Invoice’ ~> ‘Clear Invoice’)
This query would now return the number of all cases where an Invoice was posted and cleared. In our example from above, this would be 2.
The “~>” stands for "eventually follows". If we would be interested in all cases where Clear Invoice was directly occurring after Post Invoice, the query would look like the following:
SELECT count(case_id) FROM THIS_PROCESS
WHERE event_name MATCHES (‘Post Invoice’ -> ‘Clear Invoice’)
In our example from above, this would be 1.
(It makes sense to understand the other different patterns available for this sort of query, you can find them in our
documentation.)
Important: The MATCHES-Function can only be applied on case level (without FLATTEN). There is further adoption possibilities of the function, we will look into this in one of the following posts.
BEHAVIOUR:
The
BEHAVIOUR-Function looks a little more complicated to use, but believe me, it is a function that
blew my mind after already 1 year of working with SiGNAL. This is, because you can use this function to compare event-level information with case-level queries without using complicated sub-queries (we will look into this in the second part of the guide). Before I knew about this function, I was regularly creating complicated sub-queries. Often, this might not be required to be done at all.
Let’s look into an example using the data from above:
In Process Mining, one of the typical questions our customers ask to be analyzed is:
What is the perfect invoice rate? How many no-touches do we have in our process?
This is an identifier for the automation rate within the process and an indicator for process efficiency.
Without the
BEHAVIOUR-Function, you would have to create a complicated sub-query. By using the function instead, we will save a lot of efforts on doing so (believe me, I've done this so often).
Remember: The function allows us querying for event-based data and putting it into relation on case-level afterwards:
Picture 4: BEHAVIOUR MATCHES
By using the MATCHES-Function in the end we are able to relate both set up
BEHAVIOURs with each other. E.g. behaviour 1 has to be eventually followed by behaviour 2, resulting in
“Invoice Posted” ~> “Invoice Cleared”
but both pre-filtered with the event attribute
“User Type” = ‘System’.
Looking on to our data from above (Picture 2), this query will result into 1. Basically answering the question:
How many no-touch invoices did we have so far? 1.
Here you can find the link to the function descriptions of MATCHES and BEHAVIOUR MATCHES:
https://documentation.signavio.com/suite/en-us/Content/process-intelligence/signal-matching-operator...
Hope you liked the first blog post on working with SiGNAL, understanding the data model, as well as the first good practice (BEHAVIOUR MATCHES).
Enjoy applying on your own!