on 2011 Oct 12 12:58 PM
I have a table containing various terms and usage counts (plus related info).
To update this table, I have a stored procedure. This is executed while I import a bunch (80000+) of records. I have found that importing those records in parallell cuts execution time in half (on my old dual core, soon to be upgraded to quad, PC).
The procedure first checks if the term exists. If it exists then the count is updated and some other operations performed. If not, then it gets added.
This design is clearly not threadsafe.
Since this particular operation is relatively light weight, I'd happily lock the entire table if I can ensure that nobody inserts anything to it after I have checked if the term exists.
Is there a clean and nice way to do this? I'd do a lock() in my .net code, but I have no control over the number of instances running this code. It would solve my immediate problem, but not the underlying one.
The WAITFOR DELAY is just for demonstration purposes, to guarantee collisions:
CREATE TABLE MyTermTable ( MyTerm INTEGER NOT NULL PRIMARY KEY, counter INTEGER NOT NULL );CREATE PROCEDURE merge_MyTermTable ( IN @MyTerm INTEGER ) BEGIN IF NOT EXISTS ( SELECT * FROM MyTermTable WHERE MyTerm = @MyTerm ) THEN BEGIN DECLARE primary_key_violation EXCEPTION FOR SQLSTATE '23W01'; WAITFOR DELAY '00:00:05'; INSERT MyTermTable VALUES ( @MyTerm, 1 ); MESSAGE STRING ( 'INSERT: ', @MyTerm ); EXCEPTION WHEN primary_key_violation THEN UPDATE MyTermTable SET counter = counter + 1 WHERE MyTerm = @MyTerm; MESSAGE STRING ( 'Exception UPDATE: ', @MyTerm ); WHEN OTHERS THEN RESIGNAL; END ELSE UPDATE MyTermTable SET counter = counter + 1 WHERE MyTerm = @MyTerm; MESSAGE STRING ( 'Regular UPDATE: ', @MyTerm ); END IF; COMMIT; END;
-- dbisql session 1... CALL merge_MyTermTable ( 1 ); CALL merge_MyTermTable ( 2 ); CALL merge_MyTermTable ( 3 ); CALL merge_MyTermTable ( 4 ); CALL merge_MyTermTable ( 5 );
-- dbisql session 2... CALL merge_MyTermTable ( 1 ); CALL merge_MyTermTable ( 3 ); CALL merge_MyTermTable ( 5 );
-- dbeng12 console... INSERT: 1 Exception UPDATE: 1 INSERT: 2 INSERT: 3 Exception UPDATE: 3 INSERT: 5 INSERT: 4 Regular UPDATE: 5
SELECT * FROM MyTermTable ORDER BY MyTerm;
MyTerm,counter 1,2 2,1 3,2 4,1 5,2
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Volker: When you say "race condition" are you referring to the question, or the code in my answer?
At the point the EXCEPTION is thrown for the answer code above, the other connection has (by definition) performed an INSERT after this connection did the SELECT. This connection's attempt to do the UPDATE will then either (a) be blocked until the other connection commits because the other connection has the row locked, after which it will run, or (b) run without waiting because the other connection has released the lock.
@Rune: You may have noticed that COMMIT is fast whereas ROLLBACK takes forever when a large number of rows are involved; same explanation, the engine is optimized for the 99.9999% case (transactions are committed) rather than the other one.
@Volker: Are you surrrre it will deadlock? One connection has a resource the other needs, and vice versa? What are the two different resources? (they have to be different, two connections can't have locks on the same thing, or... I need more coffee 🙂
@Breck: "Race condition" referred to the question, though I related it also to the possible PK violation that will be handled by your code - which is (of course!) a working solution IMHO:)
What I'm trying to say is that the HOLDLOCK attempt would prevent the PK violation whereas your code will handle it. Can't say which is more appropriate or performs better...
Since this particular operation is relatively light weight, I'd happily lock the entire table if I can ensure that nobody inserts anything to it after I have checked if the term exists.
If you'd like to lock the entire table, there's the LOCK TABLE statement. However, the mechanism that you're specifically referring to (preventing one connection from modifying the table while another connection is also modifying the table) can be handled via Isolation Levels. If you didn't want to change your code to use the MERGE statement as Mark suggested, you might be able to adjust your isolation level for the stored procedure to use your existing stored procedure code and prevent concurrency issues.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Jeff: At the risk of sounding litigious, it's important to avoid confusion in other people's minds, especially when it involves One Of Life's Greatest Mysteries: Locking And Blocking.
The desired behavior of "preventing one connection from modifying the table while another connection is also modifying the table" does not depend on isolation level. If one connection has updated a row and not yet committed or rolled back, no other connection can update that row, not even at isolation level zero. Isolation levels can be used to "pump up the volume" of locks and blocks, but not reduce it... updates are sacrosanct.
I know I took that "quoted phrase" out of context, I know what you are getting at... but I also know that some folks read more into isolation levels than they should.
Thanks Breck - you're right that my comments as written are highly misleading.
I was assuming that the conflict was reading from the source data from the original problem description (which is based on isolation levels), not writing the inserting data. (That was a bad assumption!)
I also should not have said "modifying" in that quote and instead should have said "reading" - not sure why I would indicate that isolation levels would affect writing data. "Even a cow knows" that writes always get exclusive locks! 🙂
Perhaps you should look at the MERGE statement. The MERGE statement is capable inserting a row if the row does not exist otherwise it can update the row. Once you have inserted/updated the row you can then proceed to do your other operations.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
MERGE semantics (and INSERT...ON EXISTING) aren't "threadsafe" in this sense either. The current semantics of MERGE and INSERT ON EXISTING are based on the underlying isolation level of the connection. That is to say, the existence check done by these algorithms is not done (today) using INTENT or WRITE row locks, so it's possible that concurrently-executing statements can incur duplicate primary key errors. These errors will still have to be handled by the application in some way.
Does it mean it would take isolation level 3 to prevent other transactions to insert rows while the particular process is about to insert these?
I.e. would something like this untested code fragment prevent concurrent inserts by use of the HOLDLOCK table hint (error handling omitted):
begin if (select count(*) from MyTermTable with (holdlock) where MyTerm = @MyTerm) = 0 then insert MyTermTable... else update MyTermTable... end if; end; commit;
@Volker: You should test this... does COUNT(*) meet the definition of phantom row? Does the isolation level 3 HOLDLOCK thingie actually create anti-insert locks? ...these aren't rhetorical questions, I don't know, my cult sect doesn't allow isolation level 3 🙂
@Glenn: Are you saying that INSERT ON EXISTING actually tests for the row's existence before attempting the INSERT? As opposed to handling the exception? (which is what the syntax implies).
In other words, if I want code that is actually robust, as opposed to exhibiting the appearance of robustness, I have resort to Old School techniques? This is a revelation... 🙂
@Breck: I hope Glenn will answer these very good questions - I don't have the "code", and I'm not even sure how to confirm that the behaviour I would expect will behave as expected...
And I do not usually use isolation level 3 (except during SQL Remote extractions...), nor do I remember to have used WITH (HOLDLOCK) in production code - we don't do that much in parallel:)
Current tests with two dbisqlc sessions reveal that
select count(*) from MyTermTable with (holdlock) where MyTerm = @MyTerm
does seem to set the same phantom locks as
select * from MyTermTable with (holdlock) where MyTerm = @MyTerm
that is, such a SELECT will block your proposed stored procedure if blocking is on.
FWIW, isolation level 2 - i.e. with (REPEATABLEREAD) - wont't be sufficient, as it will prevent the modification of the row if that is already existing, however, it won't block the insertion of such a row.
Unfortunately, isolation level 3 seems to be blocking any insertion into this table with any @MyTerm value as any new row might be suspicious to fulfil the running select's conditions...
Resume:
I would think in this case the "curing" of PK violations is much better than to prevent them with isolation level 3.
@Glenn: Breck still might like to get an answer here - and me, too:)
This is a summary of my attempts with the HOLDLOCK table hint.
I think Breck's solution is better - so I just add it here for completeness...
I have changed Breck's sample procedure slightly:
CREATE OR REPLACE PROCEDURE merge_MyTermTable ( IN @MyTerm INTEGER ) BEGIN IF NOT EXISTS ( SELECT * FROM MyTermTable WITH (HOLDLOCK) WHERE MyTerm = @MyTerm ) THEN BEGIN DECLARE primary_key_violation EXCEPTION FOR SQLSTATE '23W01'; DECLARE rollback_detected EXCEPTION FOR SQLSTATE '40001'; WAITFOR DELAY '00:00:05'; INSERT MyTermTable VALUES ( @MyTerm, 1 ); MESSAGE STRING ( 'INSERT: ', @MyTerm ); EXCEPTION WHEN primary_key_violation, rollback_detected THEN UPDATE MyTermTable SET counter = counter + 1 WHERE MyTerm = @MyTerm; MESSAGE STRING ( 'Exception UPDATE: ', @MyTerm ); WHEN OTHERS THEN RESIGNAL; END ELSE UPDATE MyTermTable SET counter = counter + 1 WHERE MyTerm = @MyTerm; MESSAGE STRING ( 'Regular UPDATE: ', @MyTerm ); END IF; COMMIT; END; -- dbisql session 1... CALL merge_MyTermTable ( 1 ); CALL merge_MyTermTable ( 2 ); CALL merge_MyTermTable ( 3 ); CALL merge_MyTermTable ( 4 ); CALL merge_MyTermTable ( 5 ); -- dbisql session 2... CALL merge_MyTermTable ( 1 ); CALL merge_MyTermTable ( 3 ); CALL merge_MyTermTable ( 5 ); -- dbeng12 console... Exception UPDATE: 1 INSERT: 1 UPDATE: 2 INSERT: 3 Regular UPDATE: 3 Exception UPDATE: 4 INSERT: 4 INSERT: 5 SELECT * FROM MyTermTable ORDER BY MyTerm; MyTerm,counter 1,2 2,1 3,2 4,1 5,2
In the end, it does block much more than Breck's version has to handle PK conflicts. Therefore I guess the performance will suffer because of the HOLDLOCK.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
create special table, which store information (flag) about execution state of your special "one thread procedure". in the beginning of this one thread stored procedure lock this special table in serial mode. So after that, you get one thread procedure.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
69 | |
9 | |
8 | |
7 | |
7 | |
6 | |
5 | |
4 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.