A complex time series in real life usually has many change points inside it. When dealing with such data, simply applying traditional seasonality test to it may not render a convincing decomposition result. In this blog post, we will show how to use Bayesian Change Point Detection in the
Python machine learning client for SAP HANA(hana-ml) to detect those change points and decompose the target time series.
In this blog post, you will learn:
- Decomposition for complex time series
- Change point detection with hana-ml
Introduction
Time series may not ideally contain monotonic trend and seasonal waves after decomposition. On the contrary, it may include a great many inner change points in those parts.
Fig1
As illustrated above, we can see an obvious changing trend and seasonal wave from the time series. Currently, most algorithms are not able to extract them correctly due to the lack of change point analysis. In SAP HANA PAL and hana-ml, we provide BCPD to tackle that.
In this blog post, we will focus on the task of detecting the change points within the varying trend and seasonal components of complex time series.
Solutions
Bayesian Change Point Detection(BCPD), to some extent, can been seen as an enhanced version of seasonality test in additive mode. Similarly, it decomposes a time series into three components: trend, seasonal and random, but with a remarkable difference that it is capable of detecting change points within both trend and season parts, using a quasi RJ-MCMC method.
Like the additive decomposition in seasonality test, we treat the time series
Y(t) as an addition of trend part
T and seasonal part
S along with random noise:
where
are the parameters in the trend part, composed of the positions of trend change points and the coefficients
for each separated segment. Specifically, any trend segment can been written as
Likewise,
are the parameters in the seasonal part, composed of the positions of seasonal change points and the coefficients for each seasonal segment, i.e.
and we may have
One notable thing is that periods in different seasonal segments can vary from one to another, which expands our algorithm to much wider scenarios.
All source code in examples of the following context will use
Python machine learning client for
SAP HANA Predictive Analsysi Library(PAL).
Connect to SAP HANA
import hana_ml
from hana_ml import dataframe
cc = dataframe.ConnectionContext(address='xxx.xxx.xxx.xxx', port=30x15, user='XXXXXX', password='XXXXXX')#account details omitted
Use Case I : Detecting Changing Trend
In this use case, we will focus on detecting the change points in the trend part only,
The mocking data is stored in database in a table with name ‘PAL_MOCKING_BCPD_DATA_1_TBL’, we can use the
table() function of ConnectionContext to create a corresponding
hana_ml.DataFrame object for it.
mocking_df = cc.table('PAL_MOCKING_BCPD_DATA_1_TBL')
The
collect() function of
hana_ml.DataFrame can help to fetch data from database to the python client and the data is illustrated as follows:
plt.plot(mocking_df.collect()["SERIES"])
Fig2
The data is of length 40, then we import the BCPD algorithm from hana-ml and apply it to the mocking dataset:
from hana_ml.algorithms.pal.tsa.changepoint import BCPD
bcpd = BCPD(max_tcp=5, max_scp=0, random_seed=1)
#tcp: location of trend change points
#scp: location of seasonal change points
#period: period of each seasonal segment
#components: decomposition values of the time series
tcp, scp, period, components = bcpd.fit_predict(data=mocking_df)
Again we can use
collect() to get the final results from the database. Since we are only interested in the trend part, we can visualize that using the following code:
print(tcp.collect())
plt.plot(mocking_df.collect()["SERIES"], label='data')
plt.plot(components.collect()["TREND"], label='trend')
for cp in list(tcp.collect()["TREND_CP"]):
plt.axvline(x=cp, color="red", linestyle='dashed')
plt.legend(['original series', 'trend component'])
plt.title("Trend component")
plt.show()
Fig3
Use Case II : Detecting Changing Trend and Season
In this use case, we are to apply BCPD to the data shown in
Fig1, in which the trend and season are changing.
Similarly, the data is stored in database in a table with name ‘PAL_MOCKING_BCPD_DATA_2_TBL’ and we need to adjust our parameters for this use case
# detailed introduction of parameters can be found on our user manual page
bcpd = BCPD(max_tcp=5, max_scp=5, max_harmonic_order=1, max_period=10, max_iter=10000, interval_ratio=0.2, random_seed=1)
mocking_df = cc.table('PAL_MOCKING_BCPD_DATA_2_TBL') # data shown in Fig1
tcp, scp, period, components = bcpd.fit_predict(data=mocking_df)
The trend visualization code is the same as Use case I and the trend plot rendered is:
Fig4
Further, we use the following code to visualize the seasonal part:
print(scp.collect())
print(period.collect())
plt.plot(components.collect()["SEASONAL"], label='trend')
for cp in scp.collect()["SEASON_CP"]:
plt.axvline(x=cp, color="orange", linestyle='dashed')
plt.title("Seasonal component")
Fig5
The above plots reveal that BCPD is able to give decent decomposition results on both trend and seasonal parts from the time series.
Use Case III : Sensor Data Abrupt Change Detection and Denoising
In this use case, we are going to apply BCPD to real life sensor data to detect potential abrupt change points and to cancel the random term after decomposition for denoising .
Assume the data is stored in a dataframe named
sensor_df , we firstly visualize the data using the following code:
figure(num=None, figsize=(10, 3))
plt.plot(sensor_df.collect()["SERIES"])
print(sensor_df.collect())
Fig6
In order to obtain a better fit, we are going to use a second-order trend in BCPD for this use case :
# detailed introduction of parameters can be found on our user manual page
bcpd = BCPD(trend_order=2, max_tcp=10, max_scp=10, max_harmonic_order=10, mmin_period=50, max_period=50, max_iter=15000, interval_ratio=0.01, random_seed=1)
tcp, scp, period, components = bcpd.fit_predict(data=sensor_df)
After the algorithm finishes, we use the following code to show potential abrupt change points:
figure(num=None, figsize=(16, 4), dpi=80, facecolor='w', edgecolor='k')
plt.plot(sensor_df.collect()["SERIES"])
for cp in list(tcp.collect()["TREND_CP"]):
plt.axvline(x=cp, color="orange", linestyle='dashed')
for cp in scp.collect()["SEASON_CP"]:
plt.axvline(x=cp, color="orange", linestyle='dashed')
plt.legend(['sensor data', 'potential abrupt change'])
plt.show()
Fig7
Denoised time series can be restored by simply adding the trend and season parts after decomposition:
Fig8
In the blog post, we introduced a new SAP HANA ML algorithm for detecting change points in the time series with several use cases under
Python machine learning client for SAP HANA(hana-ml).
BCPD can be applied to different scenarios: trend test, seasonality test, change points detection, signal noise cancellation, etc.
If you want to learn more about hana-ml and SAP HANA Predictive Analysis Library (PAL), please refer to the following links:
Weibull Analysis using Python machine learning client for SAP HANA
Outlier Detection using Statistical Tests in Python Machine Learning Client for SAP HANA
Outlier Detection by Clustering using Python Machine Learning Client for SAP HANA
Anomaly Detection in Time-Series using Seasonal Decomposition in Python Machine Learning Client for ...
Outlier Detection with One-class Classification using Python Machine Learning Client for SAP HANA
Learning from Labeled Anomalies for Efficient Anomaly Detection using Python Machine Learning Client...
Additive Model Time-series Analysis using Python Machine Learning Client for SAP HANA
Time-Series Modeling and Analysis using SAP HANA Predictive Analysis Library(PAL) through Python Mac...
Import multiple excel files into a single SAP HANA table
COPD study, explanation and interpretability with Python machine learning client for SAP HANA