cancel
Showing results for 
Search instead for 
Did you mean: 

Performance issue when shipping data out of BW/HANA to Azure DataLake

former_member362452
Participant
0 Kudos

Hello Experts

Within our project we are shipping the data from SAP BW/HANA to Azure DataLake Gen 1 and witnessing severe performance issues with the overall runtimes for data transfer. Options we have tried so far include

Option 1 - Azure SSIS IR

SAP BW/HANA Hosted on Azure VM - North Europe > Calculation View (170 Columns - Attributes & Metrics, 5 Million Records) > SSIS Packages Hosted on Azure SSIS IR - West Europe > Azure DataLake Gen 1 - West Europe

Total Runtime - 88Mins

Option 2 - Azure Data Factory V2

SAP BW/HANA Hosted on Azure VM - North Europe > Calculation View (170 Columns - Attributes & Metrics, 5 Million Records) > Azure Data Factory V2 with Self Hosted IR - West Europe > Azure DataLake Gen 1 - West Europe

Total Runtime - 80Mins

BW/HANA System Details

SAP HANA - 1.00.122.23.1548298510 (fa/hana1sp12)

Platform - SUSE Linux Enterprise Server 12.4

ODBC Driver - HDB_CLIENT

HANA ODBC Version - 1.00.120.139.1589320942

We reached out to Microsoft support and they seem to suggest this is to do with ODBC driver! Has anyone come across similar situations and what was done to improve the performance?

PS: We tried to tweak the config./parameters at the SSIS Package level (Engine Threads, Batch Size, Buffer Size..)too but that had no noticeable impact on the performance

Thanks

Abhishek Shanbhogue

AtulKumarJain
Active Contributor
0 Kudos

Hi Abhishek,

Just out of curiosity ,why you are running older version of HANA here.

Please check supportibity and performance issue with release.

Best Regards,

Atul

former_member362452
Participant
0 Kudos

atul.jain3 - Thanks, I will take a look.From the same HANA system we access multiple other Calculation Views to Azure DataLake Gen 1 using Azure Data Factory V1 and they tend to work well. Runtimes are 15-20mins for 6Million records. There however we are using a much older version of HANA ODBC Driver. As the old driver was not supported with SSIS IR we are using the latest version which dont quite work well, even Microsoft support engineer recommended to use Theobald adapter for Azure. So, my question is why the new ODBC driver has performance lag which old one didnt ?

AtulKumarJain
Active Contributor
0 Kudos

Hi Abhishesk,

There will be numerus reason for that, may be other parameter need to adjust as well.

I will suggest if you can open a message to SAP support they will guide you well.

Best Regards,

Atul

former_member362452
Participant
0 Kudos

lars.breddemann - Apologies on tagging you here but I see you have responded to some of the queries related to HANA Client/ODBC performance, any help here would be much appreciated

Thanks

Abhishek Shanbhogue

Vitaliy-R
Developer Advocate
Developer Advocate
0 Kudos

Have you done any other investigation breaking that total time into parts it takes e.g. selecting data in HANA, transferring the volume of data, saving it int data lake? What is the size of data volume once it is persisted in Azure? Have you tried parallelization of the data movement to create multiple streams reading partitions of data in HANA?

former_member362452
Participant
0 Kudos

Thanks sygyzmundovych, Please see my response

We are reading data from a HANA calculation view (6 projections unioned together) for 175 columns with 3Million records

Q1: Have you done any other investigation breaking that total time into parts it takes e.g. selecting data in HANA, transferring the volume of data, saving it int data lake?

Yes, These runtimes are for loading data from HANA CV to Azure Data Lake, we tried to use default parameters where it took ~1Hr and after changing the SSIS parameters (Engine Threads, Batch Size, Buffer Size..) it didnt have any noticeable reduction in runtimes

Q2: What is the size of data volume once it is persisted in Azure?

We are reading 175 columns from HANA CV having 3 Million records

Q3:Have you tried parallelization of the data movement to create multiple streams reading partitions of data in HANA?

Have tried parameter changes for parallelisation on the Azure side by changing SSIS package and Azure Data Factory V2 but that didnt help. I could try to create a temp table with partitions and load data from CV to check if this improves but this will mean an additional step before shipping data out of HANA to Azure!

Thanks

Abhishek shanbhogue

Accepted Solutions (1)

Accepted Solutions (1)

former_member362452
Participant

Posing this as an answer for the benefit of others who have similar situation. We have done various tests and could conclude the following

- Leaner the query better the performance, In our case we were trying to read 177 columns from HANA view and that was degrading the performance. If I query the same view for 60 columns then its performance is much better

- Splitting the columns into smaller chunks and then merging is another option but this will still have some performance implications

- Changing the packet size from default value could have significant performance benefit but this again works with lean query and not for a wide column set. We tried with PACKETSIZE = 2; in the connection string and this went well

Answers (0)