Seasonality is a crucial characteristic of a time series. In the SAP HANA Predictive Analysis Library (PAL), we provide a method for seasonal decomposition. This method is also wrapped up in the Python Machine Learning Client for SAP HANA (hana-ml), which offers a seasonality test and the decomposes the time series into three components: trend, seasonal and random.
In this blog post, you will learn:
Seasonality is a characteristic of a time series where the data experiences regular and predictable changes, such as weekly and monthly. Seasonal behavior differs from cyclic behavior because seasonality always has a fixed and known period, while cyclic behavior does not have a fixed period, e.g., a business cycle. Seasonality can be used to help analyze stocks and economic trends. For instance, companies can use seasonality to help determine certain business decisions such as inventories and staffing.
In time series analysis and forecasting, we usually consider the data as a combination of trend, seasonality, and noise. We can form a forecasting model by capturing the best of these components. Typically, there are two decomposition models for time series: additive and multiplicative. The additive model is useful when the seasonal variation is relatively constant over time, whereas the multiplicative model is useful when the seasonal variation increases over time.
Real-world problems are messy and noisy, such as the trend not being monotonous, and the real model could have both additive and multiplicative components. Nevertheless, these decomposition models provide us with a structured and simple way to analyze and forecast the data. Hence, identifying the seasonality in a time series can help you build a better model. This can occur in the following ways:
In the seasonal_decompose() function of hana_ml, we provide two phases of functions:
1. Seasonality Test : The seasonal_decompose() function tests whether a time series exhibits seasonality or not by removing the trend and identifying the seasonality through the calculation of autocorrelation (acf). The output includes the number of periods, the type of model (additive/multiplicative), and the acf of the period.
2. Seasonal Decomposition : Based on the model structured in the seasonality test phase, the components of trend, seasonality, and random noise are determined.
Overall, the seasonal_decompose() function of hana_ml provides an easy and quick method to identify seasonality and decompose the time series. In the following sections, we will demonstrate how to use this function to analyze two real-world datasets.
In this section, the U.S. gasoline retail sales and New York taxi passengers cases are analyzed.
All source code will use Python Machine Learning Client for SAP HANA Predictive Analysis Library (PAL).
First, we need to establish a connection to SAP HANA. After that, we can utilize various functions of hana_ml to perform data analysis. Here is an example:
>>> import hana_ml
>>> from hana_ml import dataframe
>>> conn = dataframe.ConnectionContext('host', 'port', 'username', 'password')
Please replace ‘address’, ‘port’, ‘user’, and ‘password’ with your SAP HANA instance details.
Dataset link: https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&s=A103600001&f=M
This dataset includes the monthly data of U.S. Total Gasoline Retail Sales by Refiners (in Thousand Gallons per Day) from January 1983 to July 2020. The dataset has two columns: Date and Sales, and contains 451 data points.
The figure below illustrates the variation in the dataset, and we can observe a potential yearly pattern. From 2008 to 2015, there is a significant decrease in sales. Considering the timing, we speculate that the drop could be attributed to the 2008 economic crash, which had a pronounced negative impact on the oil and gas industry. Looking at the data for 2020, there is a steep decline in early 2020, which may be due to the lockdowns imposed during the COVID-19 pandemic in the US.
The dataset has been imported into SAP HANA under the table name “GASOLINE_TBL”. Therefore, we can access the dataset using the dataframe.ConnectionContext.table() function. Next, we add a column named ‘ID’ to the original DataFrame, gasoline_df, as the seasonal_decompose() function requires an integer column as a key column.
>>> gasoline_df = conn.table("GASOLINE_TBL") # Access to the data table
>>> gasoline_df = gasoline_df.add_id('ID' ) # Add ID column
>>> print(gasoline_df.head(5).collect()) # Show the first 5 rows of gasoline_df
Firstly, because the seasonality is indicated by the autocorrelation lag, we invoke the plot_acf() function to display the autocorrection (acf) and the result is shown in Fig. 2.
>>> from hana_ml.visualizers.eda import plot_acf
>>> plot_acf(data=gasoline_df, key='ID', col='Sales', method = 'fft', thread_ratio=0.4, enable_plotly=False)
In the beginning, it is assumed that the data follows a yearly pattern, so we expect that when the lag is 12, the value of acf is high. However, in this case, the time series is not stationary and the significant decline from 2008 to 2015 greatly affects the acf values. Consequently, the initial expectation is proven false, resulting in a decreasing curve for the acf. Hence, to identify the seasonality, it is necessary to eliminate the trend in the data.
To address this, the hana_ml library offers the seasonal_decompose() function, which conducts a seasonality test while accounting for the impact of the trend. The function is invoked as shown in the code below and returns a list of two dataframes. The first dataframe provides the statistics, such as the type of decomposition and the acf value corresponding to the period. The second dataframe contains the three decomposed components: seasonality, trend, and random.
>>> from hana_ml.algorithms.pal.tsa.seasonal_decompose import seasonal_decompose
>>> stats, decompose = seasonal_decompose(data= gasoline_df, endog = 'Sales', key='ID')
>>> print(stats.collect())
>>> print(decompose.collect())
From the result of stats, we could see the period is detected as 12 and the type of decomposition model is additive. We also provide a plot_seasonal_decompose() function to visualize the three decomposed components in Fig. 3.
>>> plot_seasonal_decompose(data=gasoline_df, key='ID', col='Sales', enable_plotly=False)
Dataset Link: https://github.com/numenta/NAB/blob/master/data/realKnownCause/nyc_taxi.csv
This dataset describes the number of NYC taxi passengers in 8 months, from July 2014 to Jan. 2015, where the five anomalies occur during the NYC marathon, Thanksgiving, Christmas, New Years day, and a snow storm. The raw data is from the NYC Taxi and Limousine Commission. The data file included here consists of aggregating the total number of taxi passengers into 30 minute buckets. Data has two columns, timestamp and value of passengers, and 10320 instances.
The dataset has been imported into the SAP HANA and the table name is "TAXI_TBL". A sample of the first 5 rows of data and a plot of first 1000 instances is shown below.
From the Fig. 4, it seems that the number of taxi passengers follows a daily and weekly pattern. Hence, we calculate the acf as follows in the code below and obtain the acf plot in Fig. 5.
>>> correlation(data=taxi_df, key='ID', x='value', max_lag=1500).collect()
We invoke the seasonal_decompose() and obtain that the period is 336 which is a weekly pattern having the highest value of acf.
>>> stats, decompose = seasonal_decompose(data=taxi_df, endog='value', key='ID')
>>> print(stats.collect())
>>> print(decompose.collect())
Also, the decomposed components of taxi dataset is shown in Fig. 6.
In this blog post, we described what is seasonality and how to analyze and decompose the time series with seasonal_decompose() of hana-ml. If you want to learn more of seasonal_decompose() function of hana-ml and SAP HANA Predictive Analysis Library (PAL), please refer to the following links:
hana-ml seasonal_decompose documentation
SAP HANA Predictive Analysis Library (PAL) Seasonality Test manual
hana-ml on Pypi.
We also provide a R API for SAP HANA PAL called hana.ml.r, please refer to more information on the documentation.
For other blog posts on hana-ml:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
11 | |
7 | |
7 | |
7 | |
6 | |
6 | |
5 | |
5 | |
5 | |
5 |