Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
david_serre
Product and Topic Expert
Product and Topic Expert
2,120

Introduction

Predictive Planning relieves you of the hassle of building a time series forecasting model. What remains to be done is evaluating the model performance and checking if it meets your requirements. Predictive Planning provides everything you need for that purpose. In this blog post I guide you through the process of evaluating the performance of time series forecasting models created in Predictive Planning.

Global Performance Indicators

The global performance indicators are metrics that allow estimating the accuracy of a time series forecasting model. These performances indicators are available in the first section of the Forecast modeling report. By default, Predictive Planning shows the Expected MAPE that is considered a good, all purposes performance indicator.

david_serre_0-1718006311145.png

Beside the Expected MAPE, Predictive Planning proposes 4 other performance indicators: Expected MAE, Expected MASE, Expected RMSE and Expected R².

You can configure what performance indicators must be displayed and how they must be displayed by clicking the Settings button (gear shaped button) of the Global Performance Indicators visualization. In the Visualization Settings dialog you can select the performance indicators you want to display as shown below:

david_serre_1-1718006331922.png

When you click OK, the Global Performance Indicators visualization is updated to with the selected indicators as shown below:

david_serre_2-1718006376959.png

How to Choose a Performance Indicator?

You may wonder why you would need different performance indicator and which one you should use to evaluate the performance of your time series forecasting model.

Which performance indicator you should use depends on your business requirements. One of the first step of a forecasting project is defining the requirements in terms of forecasting performance. What is the acceptable amount of forecasting error? Is it expressed as a percentage of error or an absolute amount of error? Are large errors for a single date a specific concern? Different requirements will naturally map to different performance indicators.

In the next section we will explain each performance indicator in details, but if you don’t have time for a long read, the following summary provides guidance you can start with:

  • use MAPE if you want to be sure the performance of several time series is comparable (for instance when comparing entities of the model). If you really don’t know which performance indicator you should use then MAPE is probably a good choice. Don’t use MAPE if the time series to be forecasted contains a lot of zero values (see One Common Issue with MAPE section of this post)
  • use MAE when you want the error to be expressed in the same unit as the time series and you don’t need to compare the performance across several time series.
  • use RMSE instead of MAE if large errors are a specific concern.
  • MASE and R² are usually more useful for deeper analysis by seasoned data scientists.

I encourage you to become familiar with the different performance indicators proposed in Predictive Planning. Let’s dive into more details that will enlighten the above guidance.

MAPE

MAPE stands for Mean Absolute Percentage Error and is defined by the formula:

david_serre_1-1718007283004.png

where Ai is the actual value and Fi is the predicted value.

MAPE represents the average percentage of error ignoring the sign of the error (so positive errors don’t compensate the negative errors). Therefore, the lower the MAPE, the better. It’s the default performance metric in Predictive Planning because of its main advantage: the MAPE is expressed as a percentage, allowing to compare the performance of different time series.

MAE

MAE stands for Mean Absolute Error and is defined by the formula:

david_serre_2-1718007313708.png

where Ai is the actual value and Fi is the predicted value.

The MAE represents the absolute amount of error ignoring the sign of the error (so positive errors don’t compensate the negative errors).  The main advantage of the MAE is that the error is expressed directly in the same unit as the forecasted time series. If you are forecasting a quantity in USD, then a MAE of 56 can be directly interpreted as: “on average the error was of 56 dollars”. The lower the MAE the better (as it’s an amount of error).  The main MAE drawback is that it depends on the scale of the time series (is the forecasted amount large or small?) and therefore doesn’t allow comparing the performance of two time series properly. For instance, let’s assume you are forecasting the number of unit sold for product A and product B. On average each month you sell 1000 units of product A and 100 000 units of product B.  The same MAE of 500 units would be probably considered as rather high for product A and rather low for product B.

MASE

MASE stands for Mean Absolute Scaled Error and compares the performance (based on MAE) of the evaluated time series model to a “naïve” model, rather than providing an amount of error.

MASE is defined by the formula:

david_serre_0-1718007372443.png

where MAE is the MAE of the considered time series model and MAEnaive is the MAE of a naïve lag1 model, that is, a model that predict at date d the value of date d-1.

Values lower than 1 indicates that the evaluate model performs better than a naïve model while values higher than 1 indicate that the naïve model performs better than the evaluated model.

RMSE

RMSE stands for Root Mean Squared Error and is defined by the formula:

david_serre_1-1718007409314.png

where Ai is the actual value and Fi is the predicted value.

RMSE is somewhat similar to MAE: it expresses an amount of error in the same unit as the forecasted time series and therefore is easy to interpret but doesn’t allow comparing performance of several time series. The difference between RMSE and MAE resides in the “square” term: it makes RMSE more sensitive to outliers (and generally speaking large errors). MAE is usually a better choice if large errors is not a specific concern.

R² (R squared) represents the proportion variance of the time series that is explained by the model and is defined by the formula:

david_serre_2-1718007457196.png

where Ai is the actual value, μ is the mean of the actual values and Fi is the predicted value.

R² also can be interpreted as a comparison between the evaluated model and a naïve model that would always predict the mean of the time series. R² must be used carefully as it requires the time series to have specific properties (stationarity) to be a reliable performance estimator.

One Common Issue with MAPE

MAPE is proposed in Predictive Planning as the default performance metric to evaluate time series forecasting models because it has several interesting qualities: it’s easy to interpret (it’s simply an amount of error expressed as a percentage) and it can be compared across different models (or entities). In most cases the MAPE is a very appropriate metric to evaluate the performance of your time series forecasting models.

However, when your time series has a lot of values close to zero, you should avoid using the MAPE to evaluate the performance of your time series forecasting model. When the forecasted time series has a lot of values actual values that are close to zero, the MAPE tends to become deceptively large. In such cases your time series model seems to have a very bad performance (very high percentage of error) while it can actually be a fairly good model.

For instance, in the figure below the predictive model does a very good job at capturing the variations of the time series and is arguably very good.  Though, the Expected MAPE value is 277.23%. This is due to the fact that the time series has many actual values equal or close to zero.

david_serre_0-1718007644015.png

This is easily explained by looking at the MAPE formula. For each point of the time series the amount of error is divided by the actual value. If the actual value is close to zero the result is skewed.

As a rule of thumb, if a time series forecasting model has an unexpectedly high MAPE, you should check if the time series to be predicted has many close to zero values. If it does, it is advised to use performance metrics other than the MAPE (MAE, MSE, RMSE).

Why are the performance indicators prefixed with “Expected”?

You have probably noticed that the performance metrics proposed in Predictive Planning are all prefixed with the term „Expected“. What does it mean?

The standard performance metrics (MAPE, MAE…) are calculated by comparing for each point of the test set the predicted value to the actual value. These metrics are an “acknowledgement” of how well the model performs on the past data but they tell you nothing about the performance you can expect when using the model to estimate future value. What Predictive Planning tries to approximate with the “expected” metrics is the performance the time series forecasting model may achieve when estimating future values.

What a MAPE of 12% tells you is that the time series model made 12% percent of error on the test data. What an Expected MAPE of 12% tells you that you can expect future predictions to be off by 12%.

How is the “expected” performance indicator calculated?

Internally the time series model generates as many predictions as requested horizon points. Each prediction corresponds to a different horizon. For each horizon, a per-horizon performance indicator is calculated. The Expected value of the performance indicator is the mean of all per-horizon performance indicator that have been calculated.

Forecast vs. Actual

On the Forecast vs. Actual visualization, you can see the predicted values along with a prediction interval. The predicted value for a given date is the best guess of the time series forecasting model, but it comes with a degree of uncertainty. The actual value will certainly not be strictly equal to the predicted value. The prediction interval tells you where the actual is likely to lie with a 95% confidence.

While the prediction interval is mainly meant to give you more details about the actual values you can expect in the future, it can also be interpreted as a visual hint about the model performance: a smaller prediction interval denotes less uncertainty about the future values and can be interpreted as a better ability for the time series model to provide precise predictions.

Evaluating a Multi Entity Time Series Forecasting Model

What we mean by “multi entity” time series forecasting model, is a time series forecasting model where multiple sub models have been created to handle the specificities of different slices of the planning model called entities. Such models are created by providing a set of dimensions in the Entity setting.

A time series forecasting model in Predictive Planning can be composed of up to 1000 entities (sub models): you cannot (and you should not) assess the performance of each entity individually.   To tackle that problem, the Overview report provides aggregated performance metrics and hints about some entities that should be specifically assessed.

david_serre_0-1718007808562.png

Top and Bottom Entities

The first section of the Overview report provides the Top Entities and Bottom Entities list. Their purpose is to bring some entities of specific interest to your attention:

  • Bottom Entities: displays the 10 entities with the worst accuracy (sorted by decreasing Expected MAPE). It’s usually worth having a look specifically at these entities to understand why their performance is not as good as the performance of the other entities.
  • Top Entities:  displays the 10 entities with the best performance (sorted by increasing Expected MAPE).

The reason for checking the entities with a low accuracy is obvious, but why should you care about the entities with the best accuracy? A good accuracy is good news after all. Yes…most of the time. But it can also reveal hidden issues.

For instance, a 0% MAPE as shown on the screenshot below must be considered as suspiciously good. Sure, sometimes the KPI you want to forecast has a very regular behavior and it's possible to predict it perfectly. A perfect model doesn’t necessarily mean something is wrong. But it’s clearly worth checking why these entities are perfectly predicted.

david_serre_1-1718007858742.png

More often than not, such a small Expected MAPE indicates that entities with a very small historical period (between 2 and 4 historical data points) as illustrated below:

david_serre_0-1718007899376.png

In this specific example, we can see that the time series forecasting model is trained using only two data points. The Expected MAPE is calculated using only the very last point and is therefore way too optimistic. When the number of historical data points is very small (fewer than 5 points) it’s not possible to extrapolate realistic future values. Small data history often is the sign of emerging entities (for instance new products introduced to the market recently) or discontinued entities (for instance products that have been removed from the market).  Usually, it’s a good idea to exclude such entities from the model as emerging entities are too recent to extrapolate realistic trends and discontinued entities don’t even need to be forecasted.  If you decide to keep these entities in the model because you absolutely need a forecast, you must keep in mind that the estimated accuracy is largely overestimated.

Global Performance Indicators (Multi Entity Models)

The Overview report also provides statistic about the performance aggregated across all the entities.

david_serre_1-1718007958206.png

For each performance indicator you can get the median, average, 3rd quartile and sum across all the entities. By default, only median and  3rd quartile are displayed. You can display the average and the sum using the settings button.

The sum is mainly provided to be used with the MAE. As explained earlier, this performance indicator is scale sensitive and therefore the amount of error will tend to be larger for larger entities. Median, average and 3rd quartile are not much relevant for MAE, as is not comparable between entities. When working with MAE as performance indicator, it’s more relevant to consider the total amount of error across all the entities (sum).

The average is provided in case it’s relevant for you but keep in mind that it can be easily skewed by extreme values (entities with a very high Expected MAPE for instance). Usually, the median is more representative of the global model performance.

Predictive Model List

When forecasting a time series, you will usually test different settings leading to different models. You need to compare these models and choose the one that brings the best performance. This is what the Predictive Models list allows you to do. The Predictive Model list is located at the bottom of the Predictive Planning user interface.

david_serre_0-1718008085768.png

This list is customizable so you can compare the different models using the performance indicators that are the most relevant to you. To display the column selection dialog (screenshot below) click the “Select Column” button (gear shaped button in the top right corner of the list).

david_serre_1-1718008127765.png

For the model comparison to be meaningful you must compare what’s comparable. But what does that mean exactly? First and foremost, it means comparing the performance of models that deal with the same business case. It doesn’t make sense comparing the performance of a model that forecasting travel expenses to the performance of a model forecasting the sales of a product.
From a practical standpoint this means that the predictive models within the same predictive scenario should all correspond to different experiment to answer the same business question. You should always have one distinct predictive scenario per predictive question and refrain from mixing unrelated models into the same predictive scenario.

Conclusion

In this blog post you learned how to use the Predictive Planning user interface to assess the accuracy of your time series forecasting models.

Do you want to learn more on Predictive Planning?