cancel
Showing results for 
Search instead for 
Did you mean: 

Several UPDATE statements got delayed and rolled back without visible reason

vlad1
Participant
614

Hi, SQL Anywhere 11.0.1.2878

This is going to sound a bit abstract - the client has noticed that, incredibly rarely, one simple process of marking 10-12 table rows with a flag does not get committed to the DB. They are accessing it via Powerbuilder application, and in a simple "browser" listings, they mark several rows with a flag Y/N and a date and save changes. Simple table of about 15 fields.

I did a complete trace of LOG file, and what we discovered is that the UPDATE statements are issued correctly, but at the end, there was a ROLLBACK rather than COMMIT WORK that we cannot find the reason for. Translated log looks like this:

--START CHECKPOINT-0000-0486009479-2023-08-28 18:56
--FINISH CHECKPOINT-0000-0486009500-2023-08-28 18:56
--BEGIN TRANSACTION-1030-0485995689
BEGIN TRANSACTION
go
--UPDATE-1030-0485996132
UPDATE DBA.settlements
   SET last_modified='2023-08-28 16:17:27.532',
       settlement_completed='Y'
 WHERE settlement_reference='91131101'
go

... and another 10 of similar UPDATE statements, but at the end we have

go
--ROLLBACK-1030-0486009518
ROLLBACK WORK

Users tried it same way the next day, worked 100% fine, identical UPDATE statements, but ended with "COMMIT WORK".

The only thing that seems strange is that looking at the SQL translation, it looks like that the current checkpoint started at 18:56, and these updates were done at 16:17:27, as users and LAST_MODIFIED field proves? If the statements were done at 16:17, so that last_modified got that stamp, how come the DB processed them at 18:56, and decided to roll back? My immediate thinking was that the table was in a lock. but:

I am 99% certain that nothing had a lock on DBA.settlements table. I have checked the translated log for 48 hours around the incident, this was the only use of DBA.settlements table for the duration of those two days.

This is one of the bog standard functionalities that has worked 100% correct over 5 or 6 years, and this problem has cropped up just twice in the past 3 months (no recent changes). However, it's quite an important functionality and I would like to find the possible reasons for this behaviour.

Thanks in advance

VolkerBarth
Contributor

FWIW, for further diagnostics, you might additionally track disconnects by using the Disconnect system event and check event_parameter "DisconnectReason" to find out about non-normal client disconnects...

Accepted Solutions (0)

Answers (2)

Answers (2)

regdomaratzki
Product and Topic Expert
Product and Topic Expert

When you translate a transaction log, by default (unless you run with -d), operations are not printed to the SQL file until the commit or rollback of the transaction on that connection is found. The fact that you have updates with a last_modified time over two hours different from a CHECKPOINT written in the SQL file tells me that this connection likely executed the updates, then sat idle for some time, then the rollback was executed, which I strongly suspect was the result of the connection being disconnected for been idle too long and the database engine doing a rollback on the open transaction of the connection that was being disconnected.

Were there actually TWO rollback work commands around offset 486009518 for connection 1030? dbtran logs a disconnect as a rollback, so if there were TWO rollbacks logged, one would be the actual rollback of the transaction when the connection disconnected and one would be the actual disconnect. You could prove this by running dbtran with the hidden -hd switch. This will output some debug text to the console that shows some different information than what you see in the SQL file. I executed some updates in DBISQL, left them uncommited, and then exited from DBISQL, and the end of the translated SQL file looks like :

--UPDATE-1006-0000018147
UPDATE dba.Admin
   SET data='update',
       last_modified='2023-08-30 08:41:53.469077'
VERIFY (data,last_modified)
VALUES (NULL,NULL)
 WHERE admin_id=2000022
go
--ROLLBACK-1006-0000018181
ROLLBACK WORK
go
--ROLLBACK-1006-0000018184
ROLLBACK WORK
go

The end of the debug output written to stdout shows that the two rollbacks logged at offsets 18181 and 18184 were actually a rollback then a disconnect.

0000018147 Update    34   6 3042
0000018181 Rollback  3    6 3042
0000018184 Release   7    6 3042

This is all very interesting, but you're asking a question on the forum for the database, but you really need to understand what the application is doing. The translated log shows that the transaction was rolled back, and some more debugging may show that the connection was disconnected because it was idle. This does not explain why your application never committed the transaction.

Reg

regdomaratzki
Product and Topic Expert
Product and Topic Expert

John answered as well with a similar response while I was typing up my answer. I'm glad we had a similar answer. 🙂

vlad1
Participant

Thanks Reg,

John replied just before you with a similar suggestion to include the -d (-hd) switch, which I did.

In any case, you are correct that there are TWO Rollbacks in my initial dbtran translated log. It goes like this:

--ROLLBACK-1030-0486009518
ROLLBACK WORK
go
--ROLLBACK-1030-0486009521
ROLLBACK WORK

This would indeed be consistent with your suggestion (and John's) that the application never called a COMMIT and then, I guess, timed out, so I will try to run a few simulations through the debugger to see if I can catch an exception when it is not called. Our call for updatable (data)windows is usually in the pfc_endtran() event, within the IF clause

if ai_update_results = 1 then
   if SQLCA.of_Commit() <> 0 then  // commit has failed
        return -1
    end if

...

else it calls SQLCA.of_Rollback()

I can't ever remember that this behaviour would somehow not be called, but I guess if both you and John were instantly thinking that Commit hasn't been issued, and it's been sort of verified by this question of yours if ROLLBACK was called twice which the trans log confirms, I guess that sort of gives me the direction to look at, and see if there if I can catch a case or two where SQLCA.Of_Commit() is not getting called.

But isn't it weird that this is very old functionality which has only produced 2 or 3 manifestations of this in all the time? Surely if the Commit is not being called somehow, this would be happening a lot more often?

johnsmirnios
Participant

I'm not sure why you say the updates were 'processed' at 18:56. You say that there was a checkpoint at that time but that's independent of whether your updates are committed or not.

To me, it looks like the application just didn't issue a COMMIT. When you see a ROLLBACK in a translated log, it could be for reasons other than an explicit ROLLBACK. For example, it could be because the client disconnected. If you use "dbtran -a -y -hd foo.log >hd.out" then the hd.out file will show the header of each entry from the log in physical/chronological order. Look to see if there is a Release (ie, a disconnect) around 0486009518 which is where you say the log is reporting a rollback.

vlad1
Participant
0 Kudos

Thanks John,

I did a trace as you suggested with the -hd option.

There is no mention of Release around 0486009518 or indeed, anywhere in the log translation. In the meantime, Reg has replied further below with a good guess - if there were actually two ROLLBACK calls in the log file (yes).

But what is weird is the following: I'm certain that the PB application issued SQLCA.of_Commit() (it's in the event), but let's just say this for the sake of this example - how come this problem would manifest itself so, so rarely. It is used probably 2-3 times a month across 4 DB environments over the past 5 years. If the application wasn't issuing COMMIT (unless there was a client disconnect), would this not happen pretty much all the time?

However, it's probably making me look in the right direction if both you and Reg are thinking that the application simply didn't call COMMIT. I will spent some time in the debugger trying to see if I can spot the Powerbuilder not calling it, though it's put there in (our) standard place in the pfc_save() hierarchy, at the end of pfc_endtran event.

johnsmirnios
Participant

You need to look in the hd.out file for the Release. hd output will look something like the following:

0001627918 Connect        11  24   0
0001627929 Commit          9  24   0
0001627938 Commit         11  24   0
0001627949 Release         7  24   0
0001627956 Connect        11  27   0
0001627967 Commit          9  27   0
0001627976 Release         7  27   0
0001627983 Connect        11  26   0

dbtran will never write a rollback for no good reason. There must be an explicit Rollback, a Release, a connect with the same ID, a server Startup, or you hit the end of the log file (or set of log files if translating multiple logs). There could be other reasons.

It is possible that your users often do something else with the app that does issue a COMMIT. If so, the connection's prior operations will be committed along with them.

regdomaratzki
Product and Topic Expert
Product and Topic Expert

However, it's probably making me look in the right direction if both you and Reg are thinking that the application simply didn't call COMMIT.

Transaction logs don't lie. No COMMIT was executed. If SQL Anywhere somehow "missed" COMMITs, we would no longer be in business.

Reg