Application Development Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 

Sorting table inserts?

Former Member
0 Kudos

Does anybody know if you gain any efficiency if you sort an internal table before you insert it into a database table? The database table would be empty to begin and then records would be added 5,000 at a time - for a total of up to 850,000 records. Would I gain anything if I inserted the records pre-sorted so that each insert was essentially appending to the end of the database table?

Thank you!

1 ACCEPTED SOLUTION

Former Member
0 Kudos

Hi Tracy,

the correct answer is, it depends (what you've probably expected). In most cases it'll help, even on databases which store the data unsorted in the tables. Those databases sort only the indexes. But if you insert unsorted data to a table with indexes, the indexes will suffer page splits during the inserts. Those page splits will add additional costs to your inserts. If you reduce the number of page splits, you'll see better performance. If you have multiple indexes with competing orders in respect to your records, you'll see probably no big improvement.

If you run on DB2 or Oracle, you'll see the best performance when you just import the data without the indexes and recreate them after the load. On MaxDB and Microsoft SQL Server the sorting by primary key will help, because those databases store the data in primary key order. On those two you should keep the primary key for loading the data.

Despite all I wrote above, I don't think that it makes a big difference if you insert only 850000 records into a table. The difference in runtime is probably much less compared to the time spend reading the answers to your question in this forum.

Best regards

Ralph Ganszky

6 REPLIES 6

Former Member
0 Kudos

I believe it will. In any event, it can't hurt.

You can also take the oppotunity to DELETE ADJACENT DUPLICATES COMPARING the key fields.

Rob

Message was edited by:

Rob Burbank

former_member194613
Active Contributor
0 Kudos

I do not think that you gain anything. Most database do anyway not sort the table lines only the index lines.

You shoould take care that you don't have key duplicates in the internal table.

But the delete adjacent duplicates should not be necessary, you better check whether you get uniqueness in the process.

Depending on the width of the table structure it can happen that you are not able to hold the whole 850.000 lines (internal table) in memory.

Do the insert with 10.000 in one step.

Siegfried

Former Member
0 Kudos

Tracy,

Offcourse the sorting before insert is efficient. But if there are too many records then the time in sorting also needs to be considered.

So it is bit dependent on the data volume.

-Ashim

Former Member
0 Kudos

Hi Tracy,

the correct answer is, it depends (what you've probably expected). In most cases it'll help, even on databases which store the data unsorted in the tables. Those databases sort only the indexes. But if you insert unsorted data to a table with indexes, the indexes will suffer page splits during the inserts. Those page splits will add additional costs to your inserts. If you reduce the number of page splits, you'll see better performance. If you have multiple indexes with competing orders in respect to your records, you'll see probably no big improvement.

If you run on DB2 or Oracle, you'll see the best performance when you just import the data without the indexes and recreate them after the load. On MaxDB and Microsoft SQL Server the sorting by primary key will help, because those databases store the data in primary key order. On those two you should keep the primary key for loading the data.

Despite all I wrote above, I don't think that it makes a big difference if you insert only 850000 records into a table. The difference in runtime is probably much less compared to the time spend reading the answers to your question in this forum.

Best regards

Ralph Ganszky

0 Kudos

I ran a small test in a DB2 envoronment and found no difference either way.

Rob

0 Kudos

Hi Tracy,

Hi Ralph,

i totaly agree Ralphs posting.

Another point to consider is that it could make a difference as well when you select the data again after the insert. Of course how you select the data. Do you need the data mainly in the order you are planing to inserting it it would probably be a good idea for Oracle and DB2 because the data will be clustered together. This could result in less physical I/Os for the selects if you compare it with data that is scattered over several blocks/pages. (In Microsoft SQL Server and MaxDB it will be in any case).

Kind regards,

Hermann