on 2013 Apr 18 6:04 PM
My company has developed software that needs to load data transferred to clients from a server using XMPP. The client uses SQL Anywhere for its database; the server uses a different database product entirely.
Our process works by keeping track of which rows of data have been transferred from one machine to the other for various data types that a client can subscribe to. When a client is added to the system for the first time, all of the data in the server to date gets downloaded. From then on, only the rows that change get sent.
Our code is written in C# and we transmit batches of data over the wire from the server to the client. On the client, we iterate over the data in the batch one row at a time and run a stored procedure which performs an "upsert" operation on a table. This is generally done using an INSERT INTO ON EXISTING UPDATE
statement, but in a couple of cases is a bit more complex.
We've built the application and its running, but we're not happy with the performance of loading data of one particular type. There are millions of rows of data of that type that have to be downloaded to the client from the server and inserted into the SA database on the first run. Our code is running as fast as it cans loading data one row at a time right now. We need to find a way to speed it up, preferably by sending all of the rows of this type in the batch to the database in one I/O.
I've been kicking around the idea of writing a DLL in C language to speed this process up. The C# code would convert the objects in a batch into an array of structs
. It would then marshal that array to a function in the DLL. The C language DLL would then create a temporary table and somehow bulk load all of that data into the temporary table. Then it would execute a stored procedure which would update the main table using all of the rows in the batch in a single operation, probably using a MERGE
statement or a cursor.
I can do all of this, except that I don't know how to do the BCP into the temporary table. I've read the documentation on Embedded SQL and nothing in there pops out at me as a way to do this. And when I search the documentation for "BCP" all I get back are references to the LOAD TABLE command and the INPUT statement. The former requires the data to be in a file on the file system and the latter is for Interactive SQL only. Not what I've looking for.
Can anyone point me in the right direction?
P.S. It's important that the actions of the C DLL participate in the same transaction used by the C# code. Can that be done?
LOAD TABLE does have a "USING CLIENT FILE" clause that will allow it to read from a file on the client machine: http://dcx.sybase.com/index.html#sa160/en/dbreference/load-table-statement.html*d5e60003
Take note of some of the requirements in the 'Privileges' section though:
When loading from a file on a client computer:
READ CLIENT FILE privilege is also required.
Read privileges are required on the directory being read from.
The allow_read_client_file database option must be enabled.
The read_client_file secure feature must be enabled.
From ESQL, you can also do wide "puts" to insert multiple rows in one request. I'm not sure if the .net driver has a way of exposing it natively -- it might. See: http://dcx.sybase.com/index.html#1201/en/dbreference/put-statement.html
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
John:
Thanks! We'll be looking at the PUT
statement in ESQL. We intend to create a temporary table, PUT all of the data for all of the rows in the batch into the temporary table in one operation, and then MERGE
the rows in the temporary table into the production table. Hopefully, this will be faster than inserting the data one row at a time using SACommand.
Have you tried using the SABulkCopy class in the SQL Anywhere .NET provider? The following sample code (lacking any error checking for simplicity) cranks in 1000 rows at once.
using System; using System.Data; using iAnywhere.Data.SQLAnywhere; namespace bulkInsert { class Program { static void Main(string[] args) { SAConnection conn = new SAConnection( "Data Source=dba_sql_12" ); conn.Open(); // Assume the following table definition : // CREATE TABLE t1 ( pk integer primary key, c1 integer ) DataTable table = new DataTable("t1"); table.Columns.Add(new DataColumn("pk", typeof(int))); table.Columns.Add(new DataColumn("c1", typeof(int))); for( int i=0; i<1000; i++ ) { table.Rows.Add(i, i); } using(SABulkCopy bulkCopy = new SABulkCopy(conn)) { bulkCopy.BulkCopyTimeout = 600; // in seconds bulkCopy.DestinationTableName = "t1"; bulkCopy.WriteToServer(table); } conn.Close(); } } }
Here's another sample that uses a DataReader instead of a DataTable as the input to bulkCopy.WriteToServer(). In your case, the DataReader can point to a DataReader that you opened on the connection to the source database.
using System; using System.Data; using iAnywhere.Data.SQLAnywhere; namespace bulkInsert { class Program { static void Main(string[] args) { SAConnection conn = new SAConnection( "Data Source=dba_sql_12" ); conn.Open(); SACommand cmd = new SACommand("select row_num, row_num+1 from sa_rowgenerator(1,1000)", conn ); SADataReader reader = cmd.ExecuteReader(); using(SABulkCopy bulkCopy = new SABulkCopy(conn)) { bulkCopy.BulkCopyTimeout = 600; // in seconds bulkCopy.DestinationTableName = "t1"; bulkCopy.WriteToServer(reader); } conn.Close(); } } }
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Reg:
Actually, we looked at that when we first started using SQL Anywhere last summer, and the performance was abysmal. We believe that SaBulkCopy
is actually sending the data one row at a time and not all of the rows in the batch in one I/O. The performance we got from it was actually worse than when we inserted the data one row at a time.
There was a recent bug fix for SABulkCopy which describes an 'OutOfMemoryException' but the underlying problem was described as
"SABulkCopy was loading the whole source table before copying to destination table" and it was fixed by making all WriteToServer methods respect batch size.
Versions fixed: 12.0.1.3790, 11.0.1.2871, 10.0.1.4346
It sounds plausible that "loading the whole source table" could be a performance issue but I'm not familiar enough with the code to comment with any certainty.
I have revisited the performance of SQBulkCopy and it's still slower than code we've written to insert rows into a table one row at a time. And in the test case, the table has no indexes.
To be clear, our code connects to the database, creates an SACommand
with a fixed INSERT INTO (...) VALUES (?, ?, ...)
statement. It then creates one SAParameter
for each column and adds them to the SACommand's Parameters
collection. We then start looping over the data, change the values of the SAParameters
, and call ExecuteNonQuery
.
If I'm not mistaken, this is the fastest way to perform the operation short of a bulk load. But SABulkCopy
is slower. I thought the purpose of a bulk load was to load the data faster than could be done using an INSERT
statement.
You may find some ideas in some of the articles about LOAD TABLE and BCP on my blog; here's a sampling...
UNLOAD and LOAD Via Named Pipes
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
66 | |
10 | |
10 | |
10 | |
10 | |
8 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.