How to implement custom predictive scores for SAP ...
CRM and CX Blogs by SAP
Stay up-to-date on the latest developments and product news about intelligent customer experience and CRM technologies through blog posts from SAP experts.
The approach described in this blog of creating a predictive data source within SAP Marketing Cloud to run predictive model training locally in SAP Marketing Cloud is not recommended any longer.
The alternative approach we recommend is to extract the required data from SAP Marketing Cloud to an external environment, where the predictive data preparation, modelling and training, as well as the score calculation are done.
The calculated score values can then be pushed to SAP Marketing Cloud using the Score API after a Custom Predictive Scenario with type ‘Calculated externally’ has been created.
Background
With SAP Marketing Cloud, predictive models can be used for the calculation of predictive scores. Such predictive scores can be used, for example, in Segmentation to identify audiences for marketing campaigns.
With SAP Marketing Cloud 1708, you can bring in custom views and use these custom views to implement custom predictive scores in your quality tenant. In a nutshell, custom predictive scores can help you to answer business questions such as "Which customers tend to return the product they bought" or "Which customers tend to churn" by leveraging predictive analytics or machine learning capabilities.
Both business questions refer to a certain interaction/event (here: return or churn interaction), identically to the standard delivered Buying Propensity Score which refers to the sales order interaction. The target/dependent variable (e.g. churn) can be predicted by predictive algorithms (see Appendix) which usually take a lot of predictors/independent variables (e.g. demographic attributes, behavioral attributes, product attributes) into account to build up a predictive model which explains the relationships between the predictors and the target variable.
User story and business question
Our exemplary user story is about predicting churn for an insurance company, however as the general concept of churn applies to other industries as well (e.g. telecommunication or banking), the concept can be adapted to other industries quite easily.
The user story is the following: "As marketing expert, I want to identify all my existing car insurance policy customers, who have a high probability to cancel their policy. I want to transfer them into a target group and run a campaign with attractive offering with the intention to keep them as customers." The first part of this user story can be answered by implementing a custom predictive score while the second part refers to standard SAP Marketing Cloud functionality.
The definition of the business question (here: "Which car insurance policy customers tend to churn") by the marketing experts or business users should always be the first step before the actual implementation of the predictive score starts.
Implementation
After the business question has been defined, the training data set columns (the predictors and the target variable) for the predictive model can be built. The training data set is basically a custom modeled HANA calculation view on top of several other views and tables in the HANA database.
Before the HANA view modelling can start, the data sources which shall be the basis for both the target variable and the predictors need to be identified. One obvious prerequisite is that the data is already loaded into and available in the HANA database on which SAP Marketing Cloud runs. Apart from that, the data quality (completeness, accuracy etc.) has to be checked to get good results. Ensuring good data quality is always important, but in the area of predictive analytics or machine learning it gets crucial. The best algorithms can't return good prediction results on bad data or compensate for bad data quality!
In more detail, the data model has been modeled as follows:
The actual insurance policy holders are modeled as interaction contacts (SAP Marketing Cloud standard, not extended).
All transactional data (e.g. new policy signings, claims, calls, policy terminations) are modeled as interactions (SAP Marketing Cloud standard, not extended).
The insurance policies are modeled as products (SAP Marketing Cloud standard, extended by four fields).
The insured objects are modeled as Custom Business Objects.
Creating a Data Source View
The data source view, a HANA calculation view, is responsible for building the actual flat data structure based on the defined business question and the underlying data model. Almost all predictive algorithms which have been developed only work on flat data, meaning each historical observation and non-classified observations are contained on one row. The original relational data structure has to be denormalized / flattened.
In our example, a simplified output of the insurance data source view looks like this (The full list of predictors can be found in the Appendix):
[…]
Country
Age
Days since last policy sign
No. of claims
No. of insured vehicles
Avg. catalog price of insured vehicles
[…]
Churn (Target variable)
DE
27
300
2
1
10.000
1 (yes)
US
?
50
0
3
50.000
0 (no)
CA
35
600
1
1
30.000
0 (no)
Some of the attributes are simple demographic attributes such as country and age which can be selected or easily calculated from the interaction contact data. However some of the fields have to be flattened through aggregation and calculated through more complex logic (e.g. the current number of insured vehicles or the average catalog price of all the insured vehicles specific to a policy holder).
The same is true for the target variable itself. In this exemplary case, the HANA view has been modeled in a way that it includes an input parameter for a specific car insurance policy type (e.g. Comprehensive Cover Policy or Full Glass Policy). The churn event is then being calculated for this specific policy type. This has the advantage that we can calculate the churn score not only for any policy, but to specific policy types separately.
The target variable has to be an integer which describes the 0/1-classification of the row (here: per customer). Multiple target variables are also supported from the framework, if you plan to create one predictive model for each of the target variables.
As outlined above, in our example, the calculation of the target variable is dependent on an input parameter. More specifically, the IP_TARGETPRODUCT (car insurance policy) defines for which policy type the churn shall be calculated. The IP_TARGETPRODUCT parameter is one of four mandatory input parameters for a data source view. The others are IP_DOC_DATE_START, IP_DOC_DATE_END and IP_JOIN_SET_ID. The first two ones should modify the date range to be taken into account from the view. This means that these input parameters should be pushed down, so that e.g. only interactions such as churns from that specified date range are taken into account. The concept behind the IP_JOIN_SET_ID is described in the appendix.
In some cases, one calculation view might not be sufficient or existing reference views should be re-used within the data source view. In that case, these reference views are included in the data source view. In context of this example, multiple reference views and one value help view (for the policy type) were included in the data source view.
Importing the Data Source View
After the data source view (and possibly further reference views) have been designed, they need to be imported into your SAP Marketing Cloud system.
For our scenario, only the very last step (using the Add Custom View App) is different from the blog above. You should start with importing your reference views first using the "Unassigned" View Usage Type. After uploading the reference views, the reference views are stored under sap\hana-app\cuan\custom_seg\ucv (the view path is also contained in the success message after importing the custom view). Same is true for any value help views, which should be uploaded using the "Value Help" View Usage Type and stored under sap\hana-app\cuan\custom_seg\uvi.
Make sure to update the references in your main data source view to the view paths above if you are using reference views and/or value help views. After all reference and value help views have been uploaded and the views are correctly referenced, the data source view using the View Usage type "Predictive Data Source" can be uploaded. Data source views get stored under sap\hana-app\cuan\cpred\cus\prds.
You can now use your uploaded view in Predictive Scenarios.
Creating Predictive Scenarios
After the data source view has been successfully loaded into the system, the predictive scenario can be created on top of the view in your quality tenant. The predictive scenario defines which data source view is used, which attributes are the target variables and which attributes should be taken as predictors, the independent/explaining variables.
Additionally, the segmentation object for which the predictive scenario should be used can be selected together with the attribute group in which you want to see the calculated score on the UI later.
Consuming Predictive Scores
After the custom predictive scenario is configured and activated in your quality tenant, it can be used within the predictive studio for creating new predictive models and publishing them as predictive scores. The screenshot below shows the configured predictive model, using the predictive scenario "Insurance Churn Propensity" (1). For this exemplary score, the target object "Comprehensive Cover" policy has been chosen (2) and the analysis period has been set to 2007 - 2017 (3). These settings set the input variables IP_TARGETPRODUCT, IP_DOC_DATE_START and IP_DOC_DATE_END.
For training the model and creating a model fit, a training set has to be chosen (4). A training set is simply a SAP Marketing Cloud target group which has to have observations of both 0/1-classifications (in our example churners and non-churners). After the target variable (5) is set, you can create a new model fit and start the model training.
The results of such a model fit are shown below.
In addition to the most contributing predictors, which allows you to examine the relative significance of the variables within the model, the framework displays both the calculated predictive power and the prediction confidence of the created model fit. An explanation of both KPIs can be found in the Appendix.
You can now select the highest-scoring members using our score within the Segmentation. Within our example (based on artificial data), by selecting the 9% of the top-ranked customers for this specific score, we can get >90% of the churners. Returning to the user story from above, marketing experts can now identify all existing car insurance policy customers, who have a high probability to cancel their policy.
Appendix: Full list of (exemplary) predictors
All following predictors are calculated per policy holder/customer individually.
Predictor
Source(s)
Age
Customer master data
Country
Customer master data
Date of birth
Customer master data
Marital status
Customer master data
Region
Customer master data
Gender
Customer master data
Remaining days of selected policy (input parameter)
Product/policy data and input parameter
Total policy volume
Product/policy data
Average selected payment frequency of policies
Product/policy data
Average remaining days of all current policies
Product/policy data
Days since last call
Interaction data
Days since last claim
Interaction data
Days since last policy change
Interaction data
Days since last policy sign
Interaction data
Days since last policy termination
Interaction data
Total number of calls
Interaction data
Total number of claims
Interaction data
Total number of policy changes
Interaction data
Total number of policy signs
Interaction data
Total number of terminations
Interaction data
Total number of insured objects
Product/policy data and insured object data
Total number of insured sport cars
Product/policy data and insured object data
Total number of insured family cars
Product/policy data and insured object data
Total number of insured SUVs
Product/policy data and insured object data
Average build year of all insured objects
Product/policy data and insured object data
Average catalog price of all insured objects
Product/policy data and insured object data
Total catalog price of all insured objects
Product/policy data and insured object data
Appendix: Algorithms
For calculating predictive scores, SAP Marketing Cloud uses algorithms from the Automated Predictive Library (APL). The APL is a implementation of automated predictive capabilities running directly in SAP HANA. In contrast to the SAP Predictive Analytics Library (SAP PAL), which is also a HANA-native implementation library of predictive algorithms, the APL is suited rather for on-the-fly predictive scoring capabilities without needing to explicitly model the analytical workflow.
In addition, there are some tutorial videos on APL on the SAP HANA Academy youtube channel which describe how APL work. The video below describes how claim frauds can be predicted using APL. The full playlist of all videos related to APL can be found here:Playlist: Automated Predictive Library
Appendix: Predictive measures
The predictive power and the prediction confidence are calculated as follows:
Of the estimation dataset (default plot), the predictive power corresponds to "the area found between the curve of the model generated and that of the random model" divided by "the area found between the curve of the perfect model and that of the random model". As the curve of the generated model approaches the curve of the perfect model, the value of the predictive power approaches 1.
Of the estimation, validation and test datasets (select the corresponding option from the list Dataset, located below the plot), the prediction confidence corresponds to one minus "the area found between the curve of the estimation dataset and that of the validation dataset "divided by "the area found between the curve of the perfect modeland that of the random model".
The following graph displays the predictive power and the prediction confidence:
Appendix: Join Set ID
The concept of the Join Set ID is described quite nicely in the product documentation where this except below is taken from.
Predictive Analytics is typically based on high amounts of data. In contrast to a classic relational data access, the data is processed on large parts of an even larger complete data set. To minimize runtime and/or memory overhead, the data should be strictly limited to the amount that is needed for the current predictive analysis.
For this purpose, the concept of the Join Set ID has been introduced. The Join Set ID represents only the contact keys that are relevant for a predictive model. With this ID, instead of creating database joins with the complete base of contact keys, joins are created using only the relevant contact keys. Usually, these relevant contact keys make up a smaller data set. Consequently, the amount of data processed is reduced.
We strongly recommend, that you push the Join Set down to the lowest level in order to reduce the processed data, as early as possible.
This means, that you join the join set table with the base table as shown in the screenshot.
In this case, you need to join with table CUAND_JS_CONTACT while setting a filter on JOIN_SET_ID with our input parameter IP_JOIN_SET_ID. The ID column from CUAND_JS_CONTACT can then be joined with the contact key in your data set.
Summary
With this blog, you have seen what is behind custom predictive scores and how they can be implemented in SAP Marketing Cloud. You should now be able to implement your own score which can help you to answer some of your business questions with the help of predictive analytics or machine learning.
You want to see more articles from SAP Services? Click on the banner below.