cancel
Showing results for 
Search instead for 
Did you mean: 

Autoincrement shadow tables - mirror high availability problem

2,746

I've been testing a mirror system set up in the usual manner which is working really well. The one problem I'm having is that for legacy reasons I'm using some "shadow tables" to do pseudo application-side auto inc columns. Each shadow table has no rows but the application uses the GET_IDENTITY function to get the next id in the sequence. The incrementing number is not being replicated to the mirror which is causing some problems.

The following example shows how it occurs:

The primary, secondary and arbiter are working normally.
The primary goes offline.
The secondary takes over from the primary but the Ids on the shadow tables are now out of sync.
The primary comes back up, synchronises and takes back the role of primary.
The out of sync Ids are written into the primary db so the primary is now out of sync with its (old) self.


Any help would be appreciated. I can write a procedure to correct the Ids when the Db starts up but that doesn't feel like a very good solution. I could set a trigger to insert/update or delete a row from the shadow table to make it replicate but that doesn't feel right either.

Thanks for your help.

Sam

SQL Anywhere 11 11.0.1.2837

VolkerBarth
Contributor

Does this also appear if a CHECKPOINT has happened immediately before the failover? (No, I don't suggest to "simply do a checkpoint just before your primary crashes" for obvious reasons...)

AFAIK, the current value of autoincrement columns is stored within the system catalog (in field SYSTABCOL.max_identity), and those metadata are stored permanently by a checkpoint, cf. that FAQ.

It would seem that since no data is written to the "shadow table", there's no data written to the translog, and as such, there's nothing to be applied at the secondary database. I guess a "real data entry" (say, a row with the last inserted identify value) would fix this. If you were using v12, a SEQUENCE might also help - though I'm not sure whether its current state will only be stored within a ckeckpoint, too).

0 Kudos

Checkpoints don't seem to have any effect. They don't make the data transfer over, at any rate. I can see the values in the SYSTABCOL table, they are as I'd expect them to be (i.e. wrong) Inserting rows does indeed work force the information to appear in the mirror db. We're looking at upgrading to 12 at some point soon. I could hold off on live failover until then and just have the mirror creating a live copy of the Db.

VolkerBarth
Contributor
0 Kudos

Hm, immediately after a checkpoint (and just before the next get_identity() call) I would expect the values to be correct in the SYSTABCOL table (and would expect these to be correct in the secondary, too - though with v11, you are not able to query that directly, are you?)

0 Kudos

I've left it sitting there for while and it's done a few checkpoints.

On the primary:

max_identity = 25463598 (on the column below)
get_identity('zzShadow_OrderLineId',0) =  25463604
(This is higher because I called a few get_identity(xxx, 1))


on the secondary:

max_identity = 25463598 (It's correct now as I did some updates to the table)
get_identity = 25463599
Breck_Carter
Participant

I hope you find a workaround, but... by its very nature the max_identity value is not transactional (it's not affected by commits and rollbacks, for example). It's one of the reasons the autoincrement process is so efficient: gaps are allowed after rollbacks.

Question: When a failover occurs, the phantom get_identity values that have been generated but not actually used in the database are presumably in the possession (memory) of the client application, and after the client connects to the new primary database, you want those values to be transmitted to the database... is that correct? If that doesn't happen, what is the penalty for the client application?

VolkerBarth
Contributor
0 Kudos

but the Ids on the shadow tables are now out of sync

Aside from my previous comments on the causes of this symptom: What exactly is meant with "out of sync"?

  • Are there gaps within the ID range, i.e. the primary has inserted up to ID 100, then crashes while the next inserts are not yet committed, and the mirror (now becoming the primary) does INSERTs with ID 105 and above, so ID 101-104 would be skipped?
  • Or do you run into duplicate key problems, i.e. after the failover the former secondary generates IDs that have already been inserted before (say, using ID 100 a second time), so the DML operations will fail?
0 Kudos

The values would have been used as primary keys on inserts. So if the value gets rolled back, new inserts cause a primary key violation. The problem is the disconnect between where the key value comes from and where the key value goes. That's my legacy system cross to bear I suppose. I think I'll change the procedure that fetched the values and include and update or something.

0 Kudos

I suppose the problem isn't with uncommitted inserts, it's with the fact that the IDs are being used in different tables to where they came from. I'm using the shadow tables as sort of unique Id sequences/caches rather. The other tables that do inserts properly are fine, and would be fine even if transactions were not committed before a crash. I don't mind about gaps in the IDs just that the changes to the autoincrement is not in sync across the primary server and its mirror.

Breck_Carter
Participant
0 Kudos

If you actually insert a row in the shadow table to generate a new key value, and then use that new key value in the same transaction to insert a row in some other table, then life will be good... with the important phrase "in the same transaction". In that case, the shadow table insert and the other table insert will be committed together, or rolled back together... and the log data being shipped to the HA secondary will be OK.

However, if the "generate a new key" and "use that key value in an insert" processes run in separate transactions then life may be difficult... complexity often causes problems.

The ideal solution would be for DEFAULT AUTOINCREMENT be used in the base tables... but second best might be to use DEFAULT AUTOINCREMENT in the shadow tables as it was intended to be used: new key values are generated by actual INSERT statements. I believe the get_identity() procedure was introduced to solve a simple problem (predict the next value) but was implemented with great power that is wholly unnecessary in the real world... like Oracle SEQUENCE values that have way-too-many features.

VolkerBarth
Contributor
0 Kudos

My very wild guess is that a database that has not cleanly shut down (i.e. without a final checkpoint) will have to "fix" the max_identity values in the system catalog as part of the automatic restore, simply because - as you have stated - the generated autoincrement values are only stored in transient memory between checkpoints.

As such I would think the server might have to query the maximum existing value (for GLOBAL AUTOINCREMENTS within its ID range) and might increase the max_identity value if it find higher existing values. (If there are only lower values, there's no need to fix the value, as autoincrements should not be re-used.)

However, this will fail in case the relevant table is empty. That seems to be the case here.


Note: This is all pure guesswork on my part...


And I certainly share Breck's suggestion to either use DEFAULT AUTOINCREMENT in the real table or at least insert real data in the shadow table (possibly with deleting existing data so that only one row exists at a time).

Accepted Solutions (1)

Accepted Solutions (1)

I've added this line to the function that returns the shadow table id

EXECUTE('UPDATE "' || @TableName || '" SET IdValue = ' || @ReturnValue || ' WHERE IdValue = (SELECT MAX(IdValue) FROM "' || @TableName || '")')


This forces the table to update and thus send the data to the mirror at the next checkpoint (or right away if I COMMIT it). It keeps one row in each Shadow Table continually updated with the last value returned by GET_IDENTITY. I might work on the slightly clunky query though, even though there will only ever be one row in those tables so it shouldn't slow it down too much.

It's working great even in situation where the primary and secondary are going up and down repeatedly.

Thanks to Breck and Volker for clearing up how this all works.

Answers (1)

Answers (1)

Former Member
0 Kudos

Something about how triggers don't replicate....

Write the data to a local table first, then code an event that fires frequently which copies the data to your replicated table.

Breck_Carter
Participant

In spite of the word "replication" being used in the original question (now fixed), this is NOT a question about SQL Remote or MobiLink, it is about High Availability... so the fine points of trigger action replication don't apply.