cancel
Showing results for 
Search instead for 
Did you mean: 

Rollback occurred due to deadlock during prefetch - in HA

Breck_Carter
Participant
2,267

Update: The title was originally "Should TRUNCATE TABLE be avoided on HA 16.0.0.2419?" but it's not just TRUNCATE.


The first SELECT on the secondary or read-only scale-out copy seems to get "Rollback occurred due to deadlock during prefetch" but the second select works.

It doesn't seem to matter if the table is already empty when the TRUNCATE TABLE runs.

...or is this a dbisql issue?

--------------------------------------
-- on primary...
TRUNCATE TABLE t;

--------------------------------------
-- on secondary or copy... FAILS...
SELECT MAX ( inserted_at )
  FROM t;

Could not execute statement.

Rollback occurred due to deadlock during prefetch
SQLCODE=-684, ODBC 3 State="40001"
Line 1, column 1
(Continuing after error)

--------------------------------------
-- on secondary or copy, again... WORKS...
SELECT MAX ( inserted_at )
  FROM t;

MAX(t.inserted_at)      
----------------------- 
(NULL)                  
(1 rows)
Execution time: 0.005 seconds
Breck_Carter
Participant
0 Kudos

Here's a workaround...

DELETE t; -- avoid TRUNCATE TABLE
CALL sa_reset_identity ( tbl_name = 't', new_identity = 0 ); -- resets DEFAULT AUTOINCREMENT
COMMIT; 
VolkerBarth
Contributor
0 Kudos

Also a question whether the fast or slow form of TRUNCATE TABLE are performed, I'd suspect...:)

Accepted Solutions (0)

Answers (1)

Answers (1)

ian_mchardy
Advisor
Advisor

The "Rollback occurred due to deadlock during prefetch" error is occurring when the mirror or copy node must do a rollback on a user connection since that user connection is blocking applying an operation that was applied to the primary and needs to be applied to the mirror or copy node. Re-executing the query should succeed after the TRUNCATE has been applied since the user connection is no longer blocking the TRUNCATE.

I believe DELETE gets per-row write locks as well as a shared schema lock, while TRUNCATE TABLE gets an exclusive table lock. If the isolation_level is 0, I think scanning the table would only require a shared schema lock, and thus would not block on a DELETE, but would block on a TRUNCATE TABLE.

Note that you can get "Rollback occurred due to deadlock during prefetch" error on the mirror or copy node anytime a user connection blocks applying operations already applied on the primary for a few seconds. There are cases where this can occur with a DELETE on the primary and a SELECT on the mirror or copy node. My guess is that TRUNCATE TABLE just makes this significantly more likely.

I hope this helps, Ian

Breck_Carter
Participant
0 Kudos

The user connections on the secondary and copy databases are doing nothing at all, except repeating that SELECT statement... how do you suppose it is blocking? ...the code is exactly as shown, this is not a symptom from production or anything like that.

VolkerBarth
Contributor
0 Kudos

But how do they work w.r.t. transactions, as truncate table requires an exclusive lock on table t...?

ian_mchardy
Advisor
Advisor
0 Kudos

I think this sequence could occur with:

ConnM: user connection to mirror

ConnP: user connection to primary

ConnM: starts doing: SELECT MAX ( inserted_at ) FROM t; -- this will get a shared schema lock on t

ConnP: TRUNCATE TABLE t; -- succeeds immediately

Mirror now attempts to apply "TRUNCATE TABLE t" - and blocks on shared schema lock. If it remains blocked for a few seconds, the Mirror will rollback ConnM, resulting in your error

If the "SELECT MAX ( inserted_at ) FROM t" always takes less than a second on the mirror, then off the top of my head, I can't think of the exact sequence that can cause this error.

Breck_Carter
Participant
0 Kudos

It may not be (just) TRUNCATE, here's a simple repro...

--------------------------------------
-- dbisql on primary...
SELECT @@VERSION;
-- 16.0.0.2419
BEGIN
   DROP TABLE t;
   EXCEPTION WHEN OTHERS THEN
END;
CREATE TABLE t (
   pkey         BIGINT NOT NULL DEFAULT AUTOINCREMENT PRIMARY KEY,
   inserted_at  TIMESTAMP NOT NULL DEFAULT CURRENT TIMESTAMP,
   padding      VARCHAR ( 1000 ) NOT NULL );
CHECKPOINT;
WAITFOR DELAY '00:00:10';
SELECT CURRENT TIMESTAMP, ' Step 1 done';

--------------------------------------
-- dbisql on secondary...
-- user waits for "Step 1" to appear on primary
WAITFOR DELAY '00:00:10';
SELECT CURRENT TIMESTAMP, ' Step 2 started';
SELECT MAX ( inserted_at ) FROM t;
-- everything looks ok

--------------------------------------
-- dbisql on primary... repeat

--------------------------------------
-- dbisql on secondary... repeat
-- result may be funky, with this in Messages tab:

Could not execute statement.

Rollback occurred due to deadlock during prefetch
SQLCODE=-684, ODBC 3 State="40001"
Line 4, column 1
(Continuing after error)
Execution time: 0.002 seconds
Breck_Carter
Participant
0 Kudos

It happens when the table is empty, and there are 10 second user waits, and... as it turns out... you don't need a TRUNCATE to see the symptom; see later comment 🙂

Breck_Carter
Participant
0 Kudos

Why does a read-only connection EVER get involved in a deadlock situation? Read-only connections might get blocked, but they don't acquire locks, so BY DEFINITION they can't be deadlocked?

It's not the read-only dbisql connection that is applying the HA stuff coming from the primary, AFAIK it is done by background recovery connections... so, maybe the read-only dbisql connection gets blocked, fine... but why doesn't it just wait?

If SQL Anywhere wants to fix a deadlock, fine, do it in the background, don't KILL MY REPORT THAT'S DUE IN SUPERIOR COURT TOMORROW MORNING! (ok, exaggerating for effect, but...)

It's starting to look like a mirror/copy database is not a reliable place to run reports... just sayin' 🙂

VolkerBarth
Contributor
0 Kudos

Read-only connections might get blocked, but they don't acquire locks

What do you mean by read-only connections? Those that just query but don't modify data? - If so, I don't think this is true. Depending on isolation level, they might acquire several kinds of locks (short-term read locks, long-term read locks, phantom locks).

Even at isolation level 0, schema locks are acquired (what would at least conflict with the exclusive lock needed for a TRUNCATE TABLE...), and those are held until the transaction ends. (Therefore I asked for the transaction boundaries in your sample.)

ian_mchardy
Advisor
Advisor

Volker is correct. Read only connections can and do acquire locks on a mirror or copy node (they must). They will always acquire at least shared schema locks when they reference any table.

Note that there doesn't need to be a deadlock to get the "Rollback occurred due to deadlock during prefetch" error. Operations that were committed to the primary need to be applied to the mirror and copy node, and if attempting to apply these operations block for a few seconds (not necessarily deadlock), you'll see the error. Unfortunately the error message isn't very accurate in this case.

While the mirror and copy node may be read-only to user connections, it is a read-write database and the operations from the primary need to be applied on a timely basis. Operations not getting applied for hours after they were committed on the primary would be very poor behavior. Requesting a higher isolation level on the mirror or copy node and getting inconsistent results would also be wrong. So locks must be respected, and to avoid applying operations blocking indefinitely, user connections can get rolled back.

Breck_Carter
Participant
0 Kudos

Of course you are correct... read-only connections acquire locks.

This is deep, kinda like SET OPTION PUBLIC.blocking = 'KILL BLOCK HOLDER' for the HA internal connection 🙂

Breck_Carter
Participant
0 Kudos

@Ian: I accidentally deleted my earlier comment praising you, so here's Tiffany Doggett to replace it :)...