When setting up the forecast process, most of the times the business needs are the first and only thing that is taken into account. However, the underlying statistical properties of the data you use for forecasting heavily impact the quality of your forecast and should be considered as well.
The historical data you use as a basis for your forecast can have patterns. Recognizing these patterns (such as trends and seasonality) helps in deciding which forecasting algorithms are more suitable and ultimately helps increase the forecast accuracy.
In SAP Integrated Business Planning, there are two processes aimed at understanding the statistical nature of the data used for forecasting:
The purpose of this blog is to give an overview of what time series analysis and change point detection do, how to best set up the related parameters and how to leverage the results for forecasting. It is designed with experts in mind, specifically those tasked with configuring the SAP IBP system to utilize these features, and thus, it will contain in-depth technical details. If your objective is merely to gather a basic understanding of Time Series Analysis and Change Point Detection, you could find it sufficient to read the opening section of each chapter.
Time series analysis can be used to identify four basic patterns in the historical data:
The following graphic shows how these properties relate to each other:
With the exception of irregularity and lumpiness, the above patterns may additionally have the following features and characteristics:
With “Time series property” we mean the combination of these patterns and characteristics for a given time series. For example, a time series property can be “Intermittent with additive seasonality”.
If you are not sure about the nature of time series data that you want to forecast, you can set the system to analyze the values, and save the identified properties in the background. This process is called time series analysis and consists of four steps:
A time series is seasonal if it displays a repeating pattern or cycle over a fixed period - typically over months or weeks. To detect whether a time series is seasonal, a statistical method called seasonality test is used.
The results of a seasonality test can impact the type of predictive model that is most appropriate for data forecasting, and for this reason, understanding seasonality in your dataset is crucial for accurate forecasting.
When setting up a profile in the Manage Forecast Automation Profiles App, there are few settings that are relevant for seasonality.
Define how sensitively the system should look for seasonality in the time series. The lower is the value of this autocorrelation coefficient, the more easily seasonality will be identified.
For example, the default 0.3 coefficient means that seasonality is identified in the time series when the highest autocorrelation coefficient is above 0.3. If you choose a lower value, the system will identify seasonality in more cases.
The value 0.3 as default was chosen after internal tests that showed good results in most cases.
You can decide whether the system should automatically detect the number of periods in a season or if you want to define this manually. If you enable this option and set a length for the cycle manually, your data will only be considered seasonal when the autocorrelation coefficient of the specified cycle is higher than the sensitivity coefficient. This way you can eliminate false seasonality detection typically caused by heavy noise in the time series. An example of this could be if you are planning in calendar weeks but expect a yearly seasonality. In this case the automatic detection of seasonality might for example recognize quarterly or half-year cycles that are not relevant for your business. Additionally, in case of high noise, automatic detection of seasonality might result in a false seasonality with a cycle of two periods.
You should keep in mind that the system will only look for cycles with the exact length you specify, which may lead to distorted results in some cases. For example, if you specify a seasonal cycle of 24 months, with 24 being an integral multiple of 12, it may happen that time series analysis identifies a seasonality cycle of 24 months and doesn’t detect that the actual length is 12 months.
Set the preference that the system should apply when identifying the type of the seasonality pattern. You have the following options:
Setting your preference for the type of seasonality is useful, for example, if you use the seasonality indices as independent variables in forecast models with algorithms that are capable of handling such variables (for example, gradient boosting of decision trees).
Trends are patterns or consistent behaviors occurring over time. In statistical terms, a trend is often represented as a linear relationship between a variable (in the case of demand forecasting, the historical sales) and time.
To evaluate the presence of a trend, a so-called trend test is performed. The trend test is able to identify whether a time series has an upward or downward trend or not, and calculate the de-trended time series.
When setting up a profile in the Manage Forecast Automation Profiles App, there are two settings that are relevant for trend.
The significance value in a trend test provides a measure of the statistical evidence that the observed trend is actually present in the data rather than merely the result of random variation.
If the significance is small then there is strong evidence of a trend. Vice versa if the significance is high, there is weak or no evidence of a trend.
For example, a value of 0.05 (the default) means there is less than a 5% chance that the observed data could occur by random chance alone if there were really no trend.
The significance does not directly provide information about the magnitude or direction of the trend, but rather about the confidence with which we can assert that a trend exists at all.
Specify if you want the system to consider change points when performing time series analysis. In case of a trend change, this means that the system only looks for trends in the segment after the last change point.
If Consider Change Points is selected, the Minimum Interval setting for Change Point Detection has an impact on the detection of the trend: if change points are allowed to be detected close to each other, this can lead to a short segment after the last change point, based on which the trend slope will be calculated. The trend slope might therefore be unprecise.
Additionally, it makes sense to use this setting if you have a relatively long history, where splitting the data depending on trend changes will still ensure that the system is able to identify a trend in the last segment. If the history is too short, this might lead to not having enough data points to identify a trend.
A time series is intermittent if it contains a lot of zero values. “A lot” is determined by the parameters of the test.
To differentiate between continuous and intermittent time series you can use different methods. You have the following options:
Please consider that using Zeros or Missing Values as a Method if the Missing Values are not to be considered as zeros might lead to the false detection of intermittency.
Using the method of Zeros or Missing Values will classify a time series as intermittent even when long periods without any sales are followed by periods of continuous sales, as only the total number of zero values versus the number of non-zero values is relevant in this case. If you want to consider a time series as intermittent only when it consistently shows some zero values between non-zero values, Average Demand Interval would be a better choice.
A time series is volatile if the volume of the data changes a lot. This is measured by checking how random the data points are and how much the data varies in relation to the mean of the dataset.
To find out if a time series is volatile, one needs to see whether the observed time series values are just random (i.e., white noise) or if they follow some underlying behavior.
If white noise is detected in the data series, it implies that the fluctuations in the data are completely random and do not follow a specific pattern or trend. If there is a lot of white noise and no trend or seasonality is identified in the data, the time series is considered irregular.
It’s crucial to understand the presence of white noise in your data, because if it's present, traditional time series forecasting methods are likely to be ineffective. Very irregular time series are virtually impossible to forecast, as there's no apparent relationship between past and future values. Excluding the time series from forecasting or using a simple Copy Past Periods method might be a better approach.
You might also consider using Confidence Prediction Intervals (available for all exponential smoothing algorithms) to cover such cases: point forecasting on irregular data is very inaccurate, but a range can be very informative.
This setting is needed to specify the level of confidence with which the results of the white noise tests should be taken into account.
The higher you set the probability, the higher the confidence level from which the system will consider the time series irregular.
For example, the default 0.9 level of probability means that the time series is considered irregular only if white noise is identified in it with a confidence level of 90% or more. A low threshold such as 0.2 will lead to false positive results because the system will consider the time series irregular every time when white noise is identified in the data with a confidence level of 20% or more.
If a time series is both intermittent and volatile, then it is called “lumpy”.
To evaluate how lumpy a time series is (i.e. how much the volume of the data changes), the system divides the square of the deviation with the square of the mean, getting the Coefficient of Variation (CV) squared as a result. Here you can specify the threshold for this calculation, the default is 0.5.
A low coefficient of variation (e.g. 0,1) indicates that the data points are close to the mean, meaning low variability, while a high coefficient of variation indicates that there is high variability from the mean.
You can make use of the results of time series analysis in different applications.
In the Manage Forecast Models app, you can choose the Consider Time Series Properties option after setting the system to utilize multiple forecasts using the Choose Best Forecast method. If you do so, the system checks the time series properties that were identified by the most recent forecast automation job and uses them to filter out the algorithms that are not expected to calculate an appropriate forecast.
If you select the Automatically Generated Seasonality Dummy in the advanced algorithms Multiple Linear Regression or Gradient Boosting of Decision Trees, the system will create an independent variable based on the season cycle found during Time Series Analysis. If no results from Time Series Analysis are available, the system will create a new Seasonality Dummy doing an ad-hoc seasonality test.
Additionally, you can set the outlier correction preprocessing algorithm to consider time series properties. If you do so, the algorithm can detect outliers that don’t vary significantly from the mean or median but do vary from the seasonality or trend pattern in the data.
In the Manage ABC/XYZ Segmentation Rules app, the system considers the results of Time Series Analysis automatically during XYZ segmentation.
And finally, you can use the results of time series analysis in the SAP IBP, add-in for Microsoft Excel, where you can use them to limit and filter your planning views if you have saved the properties in an attribute and other results in key figures.
To find out more on how to save time series properties in an attribute, have a look at the documentation.
Change Points detection is a machine learning based algorithm that allows to detect major changes that occurred in the time series and had long-term effects on the data. The following changes may occur:
This is shown in the following graphic, where the red circle marks a level shift and the green circle marks a trend change:
A level shift may happen, for example, when a new sales channel such as online shopping is opened for a product, a product is introduced for a new market, or the legal environment changes (for example, a medicine becomes subsidized). Such changes often result in a higher mean of time series values. It is also possible that the mean of actual sales decreases; this happens, for example, when a new competitor enters the market or a subsidy is discontinued.
A trend change may happen in the following cases:
If both the level shift and the trend change are significant in a time series, the analysis will only identify the trend change.
Change point detection consists of the following steps:
Let’s deep dive on the parameters you can set in the Manage Forecast Automation Profiles app related to Change Points detection.
When change point detection is performed, the system divides the historical horizon into time intervals that are either bordered by two change points, or by the first and last change points and the respective ends of the historical horizon. By this setting you can specify the minimum length of these intervals to help the system calculate statistically meaningful results.
For example, you can specify that there should be a minimum interval of 6 months between two successive change points, by which you will also define that there should be at least 6 months between the start of the historical horizon and the first change point, as well as between the end of the historical horizon and the last change point.
The interval is defined in terms of the periodicity you have chosen for the target calculation level.
Note that you can only enter an integer equal to or higher than 6 for this setting.
The system uses these settings to perform the following steps:
If at least one of these conditions is met, the change is identified as a trend change. Otherwise, no change point is detected.
You can set the Multiple Linear Regression, Auto-ARIMAX/SARIMAX, Gradient Boosting of Decision Trees and Extreme Gradient Boosting algorithms to consider change points for a more accurate forecast. If you do not consider the change points in any of these algorithms, then you can deactivate the Change Point detection during the Time Series Analysis, as they will not be used anywhere else.
Whenever Change Points are detected in a time series, the series is split to make sure that the algorithm treats each differently. In case of very short time series, this might result in worse forecast.
If the Consider Change Points option is selected for these algorithms in the Manage Forecast Models app and change points were previously found in the time series, the system divides the time series into segments between the adjacent change points.
Example
Let’s say that there are two change points in the time series, both of which were caused by external factors and may not occur again in the future. Change point detection will result in the following micro chart in this case:
When the change points are not considered by the Multiple Linear Regression algorithm, the time series is interpreted as one having an upward trend and the forecast is calculated as if this trend was expected to continue, which is probably not accurate. This is shown in the following chart:
However, when the change points are considered, the ex-post forecast fits the historical data perfectly and no further changes are predicted for the future horizon. This is illustrated by the chart below:
In addition to all the settings related to the algorithmic details of times series analysis and change points detection, you can specify how you want the results to be saved. These are optional outputs, you can always review the results directly in the Manage Forecast Automation app.
Time Series Analysis and Change Point detection help you understand the pattern in your data and ultimately improve the quality of your forecast. I hope this deep dive into Forecast Automation in SAP IBP has helped you understand the tools we offer to make this process automated and tailored to your needs.
For more information see the detailed documentation here.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
8 | |
8 | |
4 | |
3 | |
3 | |
2 | |
2 | |
2 | |
2 | |
2 |