on 2012 Sep 14 11:04 AM
I have a process in my application where I need to search a large list (large for the program. We're talking about 2.5 million rows, which isn't large by database standards).
Anyway, I have rows in table #1 that need to be matched up with rows in the large table (table #2) and new rows inserted into a third table. I use an INSERT INTO ... SELECT statement to do this and it works. The thing is that this process can only be run on a row in table #1 once. So after the INSERT INTO ... SELECT runs, table #1 is updated and a column changed so they won't be selected again.
There is a bug, however.
The data in table #1 is collected by a sensor device and then inserted into that row. There is a process that runs as a Windows service and actually executes a stored procedure that does the INSERT INTO ... SELECT statement. The problem is that it is possible for a row to be inserted into Table #1 in the time between the INSERT INTO . . . SELECT finishes determining which rows need to be included and the UPDATE begins. When that happens, that row that is never included in the results of the INSERT INTO ... SELECT.
To fix this, we are considering setting the transaction isolation level to SNAPSHOT. I know that this comes with a performance hit. The question is how big of a hit is it? Does anyone who has any experience with it have any information they will share with us?
Thanks
Tony
I'm providing a little more information here to make things a little clearer.
The application is used in law enforcement. The sensors are collecting license plate information about vehicles that drive past them. There is a list of plates belonging to persons of interest (suspended license or plates, stolen plates or vehicle, wanted, etc.) This is table #2, and has 2.5 million rows in it in our test database. This list is maintained by our proprietary server and is downloaded to each client. When the software is first installed, a client has to download all of the rows in table #2 and add them to the table. From then on, the server updates that list daily, or multiple times daily, and the table in each client is updated.
As plate information is gathered, each plate has to be checked against table 2. It can only be checked once. This is table #1. It is possible for multiple sensors to be connected to one computer, and for each camera to collect the same plate information. This goes into table #1 as 2 rows. Because of the geometry of the installation, it is possible for one camera to "see" the vehicle some number of milliseconds before the second one. The check of table #2 for the first camera can be under way when the second camera's row is added to table #1. We're trying to prevent the check of the second camera's row from being marked as done before that check is actually done.
We tried Isolation Level 3 and it worked, at first. Then we started getting deadlocks while the initial load of table #2 was occurring. This wasn't acceptable, as it created untold other problems, including delaying the download of table #2, if it was chosen as the victim of the deadlock.
What we ended up doing was setting the isolation level back to the default and we added a DECLARE LOCAL TABLE
to the stored procedure that does the check. We copy information from table #1 into this temporary table first. The rest of the procedure uses the information in the temporary table to determine which rows in table #1 are used by the INSERT INTO ... SELECT
and the UPDATE
statements. This solves the issue and, once we got the INSERT INTO ... SELECT
to use the right index, it's fast enough that deadlocks are very unlikely.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
So you haven't tried snapshot isolation (which should have solved the deadlock problems IMHO)? - I'm just curious about any performance experiences...
(Apparently, your choice to use a local temporary table to store data is a very useful and common way to make a "data snapshot".)
Aside: The subject matter reminds me of another notion of "snapshot" - does #table 2 have an external import routine? Just kidding:)
collecting license plate information about vehicles that drive past them
That gives a rollback a very vivid meaning: "Sorry mate, couldn't check your plate, please roll back and try again...". - "Deadlock" works as well:)
In your use case I think isolation level 3 would be better, as you want to prevent changes during your select. The snapshot will give you as it says a temp copy of the data, so it would prevent you from ommiting rows which are changed or deleted during your select. But as I understand the contrary is your problem, you select and might overlook newly inserted rows. This seems to fall in the class of phantom rows.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Yes, but snapshot isolation would as well prevent phantom rows: Both the insert ... select from #table1 and the update on #table 1 (if run in the same transaction) would see the same set of rows, so only the "seen" rows would be marked as "handled". - The next insert...select should then still list all rows that have been inserted after the begin of the first snapshot. - At least that is my idea of TonyV's description:)
And isolation level 3 might prevent the sensor device to deliver its data or would need an according buffer to wait until the phantom locks are released. That could or could not be a problem here...
User | Count |
---|---|
57 | |
10 | |
7 | |
7 | |
6 | |
6 | |
5 | |
5 | |
5 | |
5 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.