on 2022 Apr 14 11:34 PM
Hello
I use a HANA 2.0 database. I want to export a table or a sql query from the database to an external client as fast as possible and using a command line (i'm on windows)
I already tested differents cli methods with success but not as fast as a gui method (using HANA Studio Export)
The CSV generated is around 400MB, in all cases it is exactly the same data.
I successfully exported the result of a SQL query using hdbsql and -o option
<code>$periodstr = '202012'
$TargetFile = "OTIF_${periodstr}.csv"
$HanaUserKey = "MAESTRO_VAL"
$CSVDelimiter = '¤'
$UserQuotes = "Never"
$encoding = "UTF8"
Measure-Command{hdbsql -U $HanaUserKey -o .\${TargetFile} -I D:\Applications\python\Hana2Hyper\extractors\MAESTRO\sql\OTIF4HDBSQL.SQL -V period=$periodstr -F $CSVDelimiter -x -resultencoding $encoding}
With hdbsql it took 65s to export Network BP used ~40Mbs:
I successfully exported using ODBC (HDBODBC) using powershell dataset
<code>$TargetDir = "D:\Applications\python\Hana2Hyper\workdir"
$periodstr = '202012'
$TargetFile = "OTIF_${periodstr}.csv"
$HanaUserKey = "@MAESTRO_VAL"
$SQLQuery = [IO.File]::ReadAllText("D:\Applications\python\Hana2Hyper\extractors\MAESTRO\sql\OTIF4HDBSQL.SQL")
$SQLQuery = $SQLquery.replace('.',$periodstr)
$CSVDelimiter = 31 ## US : User Separator ascii 31
$UserQuotes = "Never"
$encoding = "UTF8"
$conn = new-object System.Data.Odbc.OdbcConnection
$conn.connectionstring = "Driver={HDBODBC};SERVERNODE=${HanaUserKey};PACKETSIZE=2097152; PREFETCH=TRUE"
$conn.Open()
$cmd = New-object System.Data.Odbc.OdbcCommand($SQLQuery,$conn)
$dset = New-Object System.Data.DataSet
$adap = New-Object System.Data.Odbc.OdbcDataAdapter($cmd)
Measure-Command{$nrr = $adap.fill($dset)}
Write-Output "Dataset Records selected: ${nrr}"
#Powershell 7 for UseQuotes
Measure-Command{Export-Csv -InputObject $dset.tables[0] -Path ${TargetDir}\${TargetFile} -Delimiter $CSVDelimiter -noTypeInformation -UseQuotes $UserQuotes -encoding $encoding}
$conn.Close()
it took 57s to export data to a dataset with HDBODBC+powershell and 1.2s to write the dataset to csv : 58s (faster that hdbsql), Network BP used : ~50Mbs
I successfully exported using ADO.NET and powershell :
$hanaado = [Reflection.Assembly]::LoadFile("D:\Applications\sap\hdbclient\ado.net\v4.5\Sap.Data.Hana.v4.5.dll")
$TargetDir = "D:\Applications\python\Hana2Hyper\workdir"
$periodstr = '202012'
$TargetFile = "OTIF_${periodstr}.csv"
$HanaUserKey = "MAESTRO_VAL"
$SQLQuery = [IO.File]::ReadAllText("D:\Applications\python\Hana2Hyper\extractors\MAESTRO\sql\OTIF4HDBSQL.SQL")
$SQLQuery = $SQLquery.replace('.',$periodstr)
$CSVDelimiter = 31 ## US : User Separator ascii 31
$UserQuotes = "Never"
$encoding = "UTF8"
$conn = New-Object Sap.Data.Hana.HanaConnection
$conn.connectionstring = "KEY=${HanaUserKey}"
$conn.Open()
$cmd = New-object sap.data.Hana.HanaCommand($SQLQuery,$conn)
$dset = New-Object System.Data.DataSet
$adap = New-Object sap.Data.Hana.HanaDataAdapter($cmd)
Measure-Command{$nrr = $adap.fill($dset)}
Write-Output "Dataset Records selected: ${nrr}"
#Powershell 7 for UseQuotes
Measure-Command{Export-Csv -InputObject $dset.tables[0] -Path ${TargetDir}\${TargetFile} -Delimiter $CSVDelimiter -noTypeInformation -UseQuotes $UserQuotes -encoding $encoding}
$conn.Close()
it took 57s to export data to a dataset with ADO.NET+powershell and 1s to write the dataset to csv : 58s (faster that hdbsql), Network BP used : ~50Mbs (Same as ODBC)
We also successfully export using python (sqlalchemy + hdbcli) and pandas dataframes in about 35s (Network BP ~90 Mbs).
We tried to change some parameters for ODBC/hdbcli like PACKETSIZE=2MB or PREFETCH=TRUE, without improving the cli extraction
And finally I successfully exported the same data that i had previously writen to a table (the create as select took 6s) using HANA Studio
**it took 16s to received the data : more than 3x better !! Network BP used 500Mbs !**
To summary results we currently have to export the same data :
The final question is : does someone know how to export data to an external client as fast as GUI SAP HANA Studio but in command line or in python or powershell ?
Request clarification before answering.
I finally found a solution based on jeff.albion suggestion and using ADO.NET + PowerShell + Remote Export :
detail and code here for the fastest method to extract HANA DATA to CSV until another solution will be found 🙂
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I found a "bug" when using EXPORT to a temporary table with a NCLOB for the CONTENT column. It seam the first x bytes of the column are appended at the end of the CSV. I don't know if the issue is due to the EXPORT itself or the ADO driver i use to retrieve data back to the client.
When using a CLOB column I don't have the problem.
User | Count |
---|---|
70 | |
21 | |
9 | |
7 | |
6 | |
6 | |
5 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.