cancel
Showing results for 
Search instead for 
Did you mean: 

Best performance on large table inserts

Former Member
1,478

Two tables:
A (col1, col2, col3, primary key (col1, col2))
B (col1, col2, col3, col4, col5, primary key (col1, col2))

With table B having thousands of records, which of the following statements would be the least expensive?

Option 1:
insert into A (col1, col2, col3) on existing skip select col1, col2, col3 from B

Option 2:
insert into A (col1,col2,col3) select col1, col2, col3 from B where not exists (select * from A where col1=b.col1 and col2=b.col2)

My gut tells me they're exactly the same. But I need more than my gut to go on.

reimer_pods
Participant

No, I don't claim to know the answer. But I wanted to add, if you're on SQLA 11 or higher, there's a 3rd method (example not verified):

MERGE INTO A (col1, col2, col3)
USING B ON A.col1 = B.col1 and A.col2 = B.col2
WHEN NOT MATCHED INSERT

IMHO all tree statements should be broken down by the query engine to similar sequences of execution, so there should be no real difference in performance.

But that's speculative, so to get more detailed information you might want to execute both statements in identical environments, capture the graphical plans and analyze them for deviations.

JimDiaz
Participant

OK just a guess but I would think Option 1 is more efficient when dealing with a small number of rows and option 2 when dealing with large inserts. I believe some DBMS's just ignore the error with option 1.

Jim

Former Member
0 Kudos

I'm with Jim on this one. Thanks guys

VolkerBarth
Contributor
0 Kudos

I believe some DBMS's just ignore the error with option 1.

FWIW, I believe "INSERT ... ON EXISTING ..." is primarily a SQL Anywhere vendor extension so I don't really understand the "some DBMS's" statement... - of course, other DBMS like MySQL have somwhat similar extensions like "INSERT ... ON DUPLICATE KEY UPDATE..."

Accepted Solutions (0)

Answers (1)

Answers (1)

Breck_Carter
Participant
  1. Predicting the performance of real-world SQL queries by visually inspecting stripped-down sample code has almost no chance of success.
  2. A three-column table with only "thousands of rows" may be regarded as a very small table by SQL Anywhere, not worthy of optimization. Exceptions exist, of course; e.g. this question about a 6,000 row table with an INTEGER primary key index that consumes 1.1G of disk space 🙂
  3. In this case (comparing ON EXISTING SKIP with WHERE NOT EXISTS) it's not difficult to actually code and test BOTH versions, in the real-world form, and look at the Graphical Plan With Statistics... testing is the only reliable way to determine performance...
  4. ...and even then, it's not perfect, because when conditions change sometimes so do the plans.

Having said that, my guess is they will have the same plans and the same performance. I am VERY PROUD of my guesswork... alt text