cancel
Showing results for 
Search instead for 
Did you mean: 

What is the fastest method to export HANA data (table or query result) to CSV ?

rferraton
Explorer
0 Kudos
5,713

Hello

I use a HANA 2.0 database. I want to export a table or a sql query from the database to an external client as fast as possible and using a command line (i'm on windows)

I already tested differents cli methods with success but not as fast as a gui method (using HANA Studio Export)

The CSV generated is around 400MB, in all cases it is exactly the same data.

I successfully exported the result of a SQL query using hdbsql and -o option

<code>$periodstr = '202012'
$TargetFile = "OTIF_${periodstr}.csv"
$HanaUserKey = "MAESTRO_VAL"

$CSVDelimiter = '¤'  
$UserQuotes = "Never"
$encoding = "UTF8"

Measure-Command{hdbsql -U $HanaUserKey -o .\${TargetFile} -I D:\Applications\python\Hana2Hyper\extractors\MAESTRO\sql\OTIF4HDBSQL.SQL -V period=$periodstr -F $CSVDelimiter -x -resultencoding $encoding}


With hdbsql it took 65s to export Network BP used ~40Mbs:

I successfully exported using ODBC (HDBODBC) using powershell dataset

<code>$TargetDir = "D:\Applications\python\Hana2Hyper\workdir"
$periodstr = '202012'
$TargetFile = "OTIF_${periodstr}.csv"
$HanaUserKey = "@MAESTRO_VAL"

$SQLQuery = [IO.File]::ReadAllText("D:\Applications\python\Hana2Hyper\extractors\MAESTRO\sql\OTIF4HDBSQL.SQL")
$SQLQuery = $SQLquery.replace('.',$periodstr)

$CSVDelimiter = 31  ## US : User Separator ascii 31
$UserQuotes = "Never"
$encoding = "UTF8"

$conn = new-object System.Data.Odbc.OdbcConnection
$conn.connectionstring = "Driver={HDBODBC};SERVERNODE=${HanaUserKey};PACKETSIZE=2097152; PREFETCH=TRUE"
$conn.Open()
$cmd = New-object System.Data.Odbc.OdbcCommand($SQLQuery,$conn)
$dset = New-Object System.Data.DataSet
$adap = New-Object System.Data.Odbc.OdbcDataAdapter($cmd)
Measure-Command{$nrr = $adap.fill($dset)}
Write-Output "Dataset Records selected: ${nrr}"

#Powershell 7 for UseQuotes
Measure-Command{Export-Csv -InputObject $dset.tables[0] -Path ${TargetDir}\${TargetFile} -Delimiter $CSVDelimiter -noTypeInformation -UseQuotes $UserQuotes -encoding $encoding}

$conn.Close()



it took 57s to export data to a dataset with HDBODBC+powershell and 1.2s to write the dataset to csv : 58s (faster that hdbsql), Network BP used : ~50Mbs

I successfully exported using ADO.NET and powershell :

$hanaado = [Reflection.Assembly]::LoadFile("D:\Applications\sap\hdbclient\ado.net\v4.5\Sap.Data.Hana.v4.5.dll")
$TargetDir = "D:\Applications\python\Hana2Hyper\workdir"
$periodstr = '202012'
$TargetFile = "OTIF_${periodstr}.csv"
$HanaUserKey = "MAESTRO_VAL"
$SQLQuery = [IO.File]::ReadAllText("D:\Applications\python\Hana2Hyper\extractors\MAESTRO\sql\OTIF4HDBSQL.SQL")
$SQLQuery = $SQLquery.replace('.',$periodstr)
$CSVDelimiter = 31 ## US : User Separator ascii 31
$UserQuotes = "Never"
$encoding = "UTF8"
$conn = New-Object Sap.Data.Hana.HanaConnection
$conn.connectionstring = "KEY=${HanaUserKey}"
$conn.Open()
$cmd = New-object sap.data.Hana.HanaCommand($SQLQuery,$conn)
$dset = New-Object System.Data.DataSet
$adap = New-Object sap.Data.Hana.HanaDataAdapter($cmd)
Measure-Command{$nrr = $adap.fill($dset)}
Write-Output "Dataset Records selected: ${nrr}"
#Powershell 7 for UseQuotes
Measure-Command{Export-Csv -InputObject $dset.tables[0] -Path ${TargetDir}\${TargetFile} -Delimiter $CSVDelimiter -noTypeInformation -UseQuotes $UserQuotes -encoding $encoding}
$conn.Close()

it took 57s to export data to a dataset with ADO.NET+powershell and 1s to write the dataset to csv : 58s (faster that hdbsql), Network BP used : ~50Mbs (Same as ODBC)

We also successfully export using python (sqlalchemy + hdbcli) and pandas dataframes in about 35s (Network BP ~90 Mbs).

We tried to change some parameters for ODBC/hdbcli like PACKETSIZE=2MB or PREFETCH=TRUE, without improving the cli extraction

And finally I successfully exported the same data that i had previously writen to a table (the create as select took 6s) using HANA Studio

**it took 16s to received the data : more than 3x better !! Network BP used 500Mbs !**

To summary results we currently have to export the same data :

  • CLI hdbsql ==> 65s
  • CLI HBDODBC+powershell dataset ==> 58s
  • CLI ADO.NET+powershell dataset ==> 58s
  • CLI Python using pandas + sqlalchemy(hdbcli) ==> 35s
  • GUI SAP HANA Studio ==> 16s

The final question is : does someone know how to export data to an external client as fast as GUI SAP HANA Studio but in command line or in python or powershell ?

Accepted Solutions (0)

Answers (3)

Answers (3)

jeff_albion
Product and Topic Expert
Product and Topic Expert

Hello,

The reason that HANA Studio is faster is because it doesn't select the data for export directly to the client. Underneath the covers it is doing this:

CREATE LOCAL TEMPORARY TABLE #CLIENT_EXPORT_1650287698305 (FILENAME VARCHAR(64), PATH VARCHAR(255), CONTENT BLOB)
EXPORT "SYSTEM"."MYEXPORTTABLE" AS BINARY INTO #CLIENT_EXPORT_1650287698305
SELECT "FILENAME","PATH","CONTENT" FROM #CLIENT_EXPORT_1650287698305
SELECT * FROM #EXPORT_RESULT

So unfortunately your use-cases that you are comparing aren't the same and you are already implicitly using EXPORT statement inside HANA Studio. Perhaps this approach will help with the other interfaces you're trying?

In general, hdbsql is primarily intended as a script-runner or interactive shell for administration tasks, not as a main import/export tool. SDI is the official recommendation for importing / exporting data across systems.

rferraton
Explorer
0 Kudos

First I Big THANKS for you quick and interesting answer !
I would not take this way without help !

I will test quickly, I'll come back update the post

Other question jeff.albion : should I create a CLOB column for the CONTENT column if the EXPORT is done as CSV (instead as BINARY in your example) ?

SDI is official but not always as flexible as we could expert.
But I will include it in benchmark if I can.

Regards and thank you one more time

jeff_albion
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hi Romain,

You're welcome. HANA Studio still uses BLOB for "EXPORT...AS CSV", so I would recommend comparing with that. You could try NCLOB and see if there are any performance benefits.

Best regards,

rferraton
Explorer
0 Kudos

I try the EXPORT method and I have a coherent result using HANA Studio

Problems occurs when I use CLI tools to retrieve the data :

- ODBC return Error because of the EXPORT (non standard for ODBC ==> syntax error)

- JDBC return Error because of the EXPORT (non standard for JDBC ==> syntax error)

- HDBSQL : partially work for the moment (I use a CLOB) for the content column to have readable data but the content is "cutted" and only 85 characters are exported (i use hdbsql -o output.csv and hdbsql ..> output.csv methods, both return only a few characters). May be i am missing something (or maybe a kind of network protection cut my extraction...) I will try NCLOB but i doubt it will change anything, and i will try with a local HXE

Nota: When a use a length of the CONTENT column for the filename=data.csv, it return ~360 000 000. Which is normal, this is the amount expected

Other small issue with the EXPORT command : it is not possible to decide with delimiter to use. EXPORT INTO command allow that change.

rferraton
Explorer
0 Kudos

I finally found a solution based on jeff.albion suggestion and using ADO.NET + PowerShell + Remote Export :

detail and code here for the fastest method to extract HANA DATA to CSV until another solution will be found 🙂

rferraton
Explorer
0 Kudos

I found a "bug" when using EXPORT to a temporary table with a NCLOB for the CONTENT column. It seam the first x bytes of the column are appended at the end of the CSV. I don't know if the issue is due to the EXPORT itself or the ADO driver i use to retrieve data back to the client.
When using a CLOB column I don't have the problem.

Cocquerel
Active Contributor
0 Kudos

did you try EXPORT statement with hdbsql?

rferraton
Explorer
Hello, this EXPORT INTO statement have 2 issues for my case :
  • I need special system privileges (EXPORT) that I have not for the moment
  • The EXPORT INTO (as far as I understand) , export the file on the HANA server machine not on the client machine (my requierement). So I will need a system access to retrieve it

When I will be granted to export, I will provide times in this post, to compare it with others