Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
nidhi_sawhney
Product and Topic Expert
Product and Topic Expert
5,638
For a long time now we have been waiting to get pivoting functionality with SAP HANA. This was not possible and there have been several attempts to get same functionality with custom sql codes. In this blog post I will describe how pivoting can be achieved via  SAP HANA Python Client API for Machine Learning Algorithms (hana_ml package) without the moving data out of SAP HANA.

With hana_ml version 1.0.7 we can now pivot the data easily and intuitively. This functionality is only supported via hana_ml and not in SAP HANA SQL directly, as the hana_ml wrapper generates the required sql code on the fly.

Lets say I have a SAP HANA table which has sample data as below and we access it via hana_ml dataframe df_data:



The number of different SKUs is 26



Now we want to run a correlation analysis between the different SKUs to check if their sales are potentially cannibalizing or positively impacting each other.

To do this we need to pivot the data so we have a column for each of the 26 SKUs containing the SALES_VALUE.
#Create the pivoted data as a hana_ml dataframe with pivot_table
dp = df_data.pivot_table(values='SALES_VALUE',index = 'DATE_MEASUREMENT', columns='SKU',aggfunc='sum')

#Format the data to fillin zero and have the right datatype for HANA PAL correlation function
dp = dp.fillna(0,SKUS).cast(SKUS, 'DOUBLE')

This generates a hana_ml dataframe which has SALES_VALUE for each of the 26 SKUs in a column format:



Now we can easily get a correlation matrix between all SKUs as follows, here I am choosing to show only the top 10 SKUs based on their SALES_VALUE but you can run the same analysis for all SKUs very efficiently:
#Focus on only top 10 SKUs 
SKUS = list(set(top_10_skus.select('SKU').collect()['SKU']))

from hana_ml.algorithms.pal import stats
corr = stats.pearsonr_matrix(conn_context=conn, data=dp, cols = SKUS)
corr.collect()

This provides the following hana_ml dataframe with correlation coefficient between every pair of SKUs.



We can also easily plot this as a matrix using  hana_ml.visualizers package

Many other plotting capabilities with hana_ml.visualizers is described very nicely in Arun Godwin Patel's blog post
from hana_ml import visualizers
from hana_ml.visualizers.eda import EDAVisualizer
import matplotlib.pyplot as plt
get_ipython().magic('matplotlib inline')
f = plt.figure(figsize=(10,10))
ax = f.add_subplot()
eda = EDAVisualizer(ax)

ax, corr_data = eda.correlation_plot(data=dp, corr_cols =SKUS, label=True, cmap='RdYlGn')
plt.show()

 

This produces the following correlation matrix.



 

Finally we now have a way of pushing down pivoting functionality to SAP HANA without moving data outside or writing customized SQL.
6 Comments