on 2022 Apr 14 11:34 PM
Hello
I use a HANA 2.0 database. I want to export a table or a sql query from the database to an external client as fast as possible and using a command line (i'm on windows)
I already tested differents cli methods with success but not as fast as a gui method (using HANA Studio Export)
The CSV generated is around 400MB, in all cases it is exactly the same data.
I successfully exported the result of a SQL query using hdbsql and -o option
<code>$periodstr = '202012'
$TargetFile = "OTIF_${periodstr}.csv"
$HanaUserKey = "MAESTRO_VAL"
$CSVDelimiter = '¤'
$UserQuotes = "Never"
$encoding = "UTF8"
Measure-Command{hdbsql -U $HanaUserKey -o .\${TargetFile} -I D:\Applications\python\Hana2Hyper\extractors\MAESTRO\sql\OTIF4HDBSQL.SQL -V period=$periodstr -F $CSVDelimiter -x -resultencoding $encoding}
With hdbsql it took 65s to export Network BP used ~40Mbs:
I successfully exported using ODBC (HDBODBC) using powershell dataset
<code>$TargetDir = "D:\Applications\python\Hana2Hyper\workdir"
$periodstr = '202012'
$TargetFile = "OTIF_${periodstr}.csv"
$HanaUserKey = "@MAESTRO_VAL"
$SQLQuery = [IO.File]::ReadAllText("D:\Applications\python\Hana2Hyper\extractors\MAESTRO\sql\OTIF4HDBSQL.SQL")
$SQLQuery = $SQLquery.replace('.',$periodstr)
$CSVDelimiter = 31 ## US : User Separator ascii 31
$UserQuotes = "Never"
$encoding = "UTF8"
$conn = new-object System.Data.Odbc.OdbcConnection
$conn.connectionstring = "Driver={HDBODBC};SERVERNODE=${HanaUserKey};PACKETSIZE=2097152; PREFETCH=TRUE"
$conn.Open()
$cmd = New-object System.Data.Odbc.OdbcCommand($SQLQuery,$conn)
$dset = New-Object System.Data.DataSet
$adap = New-Object System.Data.Odbc.OdbcDataAdapter($cmd)
Measure-Command{$nrr = $adap.fill($dset)}
Write-Output "Dataset Records selected: ${nrr}"
#Powershell 7 for UseQuotes
Measure-Command{Export-Csv -InputObject $dset.tables[0] -Path ${TargetDir}\${TargetFile} -Delimiter $CSVDelimiter -noTypeInformation -UseQuotes $UserQuotes -encoding $encoding}
$conn.Close()
it took 57s to export data to a dataset with HDBODBC+powershell and 1.2s to write the dataset to csv : 58s (faster that hdbsql), Network BP used : ~50Mbs
I successfully exported using ADO.NET and powershell :
$hanaado = [Reflection.Assembly]::LoadFile("D:\Applications\sap\hdbclient\ado.net\v4.5\Sap.Data.Hana.v4.5.dll")
$TargetDir = "D:\Applications\python\Hana2Hyper\workdir"
$periodstr = '202012'
$TargetFile = "OTIF_${periodstr}.csv"
$HanaUserKey = "MAESTRO_VAL"
$SQLQuery = [IO.File]::ReadAllText("D:\Applications\python\Hana2Hyper\extractors\MAESTRO\sql\OTIF4HDBSQL.SQL")
$SQLQuery = $SQLquery.replace('.',$periodstr)
$CSVDelimiter = 31 ## US : User Separator ascii 31
$UserQuotes = "Never"
$encoding = "UTF8"
$conn = New-Object Sap.Data.Hana.HanaConnection
$conn.connectionstring = "KEY=${HanaUserKey}"
$conn.Open()
$cmd = New-object sap.data.Hana.HanaCommand($SQLQuery,$conn)
$dset = New-Object System.Data.DataSet
$adap = New-Object sap.Data.Hana.HanaDataAdapter($cmd)
Measure-Command{$nrr = $adap.fill($dset)}
Write-Output "Dataset Records selected: ${nrr}"
#Powershell 7 for UseQuotes
Measure-Command{Export-Csv -InputObject $dset.tables[0] -Path ${TargetDir}\${TargetFile} -Delimiter $CSVDelimiter -noTypeInformation -UseQuotes $UserQuotes -encoding $encoding}
$conn.Close()
it took 57s to export data to a dataset with ADO.NET+powershell and 1s to write the dataset to csv : 58s (faster that hdbsql), Network BP used : ~50Mbs (Same as ODBC)
We also successfully export using python (sqlalchemy + hdbcli) and pandas dataframes in about 35s (Network BP ~90 Mbs).
We tried to change some parameters for ODBC/hdbcli like PACKETSIZE=2MB or PREFETCH=TRUE, without improving the cli extraction
And finally I successfully exported the same data that i had previously writen to a table (the create as select took 6s) using HANA Studio
**it took 16s to received the data : more than 3x better !! Network BP used 500Mbs !**
To summary results we currently have to export the same data :
The final question is : does someone know how to export data to an external client as fast as GUI SAP HANA Studio but in command line or in python or powershell ?
Request clarification before answering.
Hello,
The reason that HANA Studio is faster is because it doesn't select the data for export directly to the client. Underneath the covers it is doing this:
CREATE LOCAL TEMPORARY TABLE #CLIENT_EXPORT_1650287698305 (FILENAME VARCHAR(64), PATH VARCHAR(255), CONTENT BLOB)
EXPORT "SYSTEM"."MYEXPORTTABLE" AS BINARY INTO #CLIENT_EXPORT_1650287698305
SELECT "FILENAME","PATH","CONTENT" FROM #CLIENT_EXPORT_1650287698305
SELECT * FROM #EXPORT_RESULT
So unfortunately your use-cases that you are comparing aren't the same and you are already implicitly using EXPORT statement inside HANA Studio. Perhaps this approach will help with the other interfaces you're trying?
In general, hdbsql is primarily intended as a script-runner or interactive shell for administration tasks, not as a main import/export tool. SDI is the official recommendation for importing / exporting data across systems.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
First I Big THANKS for you quick and interesting answer !
I would not take this way without help !
I will test quickly, I'll come back update the post
Other question jeff.albion : should I create a CLOB column for the CONTENT column if the EXPORT is done as CSV (instead as BINARY in your example) ?
SDI is official but not always as flexible as we could expert.
But I will include it in benchmark if I can.
Regards and thank you one more time
I try the EXPORT method and I have a coherent result using HANA Studio
Problems occurs when I use CLI tools to retrieve the data :
- ODBC return Error because of the EXPORT (non standard for ODBC ==> syntax error)
- JDBC return Error because of the EXPORT (non standard for JDBC ==> syntax error)
- HDBSQL : partially work for the moment (I use a CLOB) for the content column to have readable data but the content is "cutted" and only 85 characters are exported (i use hdbsql -o output.csv and hdbsql ..> output.csv methods, both return only a few characters). May be i am missing something (or maybe a kind of network protection cut my extraction...) I will try NCLOB but i doubt it will change anything, and i will try with a local HXE
Nota: When a use a length of the CONTENT column for the filename=data.csv, it return ~360 000 000. Which is normal, this is the amount expected
Other small issue with the EXPORT command : it is not possible to decide with delimiter to use. EXPORT INTO command allow that change.
I finally found a solution based on jeff.albion suggestion and using ADO.NET + PowerShell + Remote Export :
detail and code here for the fastest method to extract HANA DATA to CSV until another solution will be found 🙂
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I found a "bug" when using EXPORT to a temporary table with a NCLOB for the CONTENT column. It seam the first x bytes of the column are appended at the end of the CSV. I don't know if the issue is due to the EXPORT itself or the ADO driver i use to retrieve data back to the client.
When using a CLOB column I don't have the problem.
did you try EXPORT statement with hdbsql?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
When I will be granted to export, I will provide times in this post, to compare it with others
User | Count |
---|---|
70 | |
21 | |
9 | |
7 | |
6 | |
6 | |
5 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.