cancel
Showing results for 
Search instead for 
Did you mean: 

High Avalabilty log files mismatch

Former Member
3,896

SAP SQL Anywhere 16.0.0.2043 DB Auditing is OFF mode=synchronous

I am using High Availability as follows:- partner db on local, partner db and arbitor on remote.

local started first runs as primary remote mirrors

I have all three servers talking and all is well for a while (A few hours at most). Within a few hours the local partner (primary) disconnects and the error message "logs do not match" is logged on restart (see below).

Copying the .db and .log from remote to local means a restart will work (until the next mismatch).

Any ideas

TIA

Ephraim

Below is the log from the server

I. 06/26 10:13:43.     TCPIP link started successfully
I. 06/26 10:13:43. Now accepting requests
I. 06/26 10:13:43.     Transaction log: ks.log...
I. 06/26 10:13:43.     Database "ks" transaction log starts at offset 177760696
I. 06/26 10:13:43. Database "ks" mirroring: local status for this server: role=mirror, state=synchronizing, sequence=45, yielding=N
I. 06/26 10:13:43. Database "ks" mirroring:  determining mirror role ...
I. 06/26 10:13:44. Database "ks" mirroring:  connected to partner "mirror_ks2"
I. 06/26 10:13:44. Database "ks" mirroring: partner status for this server: role=mirror, state=synchronizing, sequence=46, yielding=N
I. 06/26 10:13:44. Database "ks" mirroring:  connected to arbiter "ks_arbiter"
I. 06/26 10:13:44. Database "ks" mirroring: arbiter status for this server: role=mirror, state=synchronizing, sequence=46, yielding=N
I. 06/26 10:13:44. Database "ks" mirroring:  synchronizing ...
I. 06/26 10:13:44. Database "ks" transaction log current offset is 181865872
I. 06/26 10:13:44. Database "ks" (ks.db) started as mirror at Fri Jun 26 2015 10:13
I. 06/26 10:13:44. Database "ks" mirroring:  failed getting log pages
E. 06/26 10:13:44. Database "ks" mirroring:  database is not compatible with primary; files must be replaced
E. 06/26 10:13:45. Transaction log does not match log on mirror partner
E. 06/26 10:13:45. Transaction log does not match log on mirror partner
E. 06/26 10:13:45. Transaction log does not match log on mirror partner
I. 06/26 10:13:45. Database "ks" (ks.db) stopped at Fri Jun 26 2015 10:13
I. 06/26 10:13:45. Database "ks" mirroring:  disconnected from partner "mirror_ks2"
I. 06/26 10:13:45. Database "ks" mirroring:  disconnected from arbiter "ks_arbiter"

Breck_Carter
Participant
0 Kudos

Excerpts from the dbsrv16 -o files for all three servers, for a period of time before and after the restart, may help.

Viewing the three excerpts side-by-side, lined up by timestamp, can sometimes help. I have some code that can turn three excerpts into an Excel spreadsheet for this purpose, if you send the three log exerpts to breck dot carter at gmail dot com.

Former Member
0 Kudos

Unfortunately, diagnosing log mismatch issues can be difficult. It would be interesting and helpful to know why the primary is restarting so frequently.

(For simplicity, let's refer to the local server, which was initially primary, as S1, and the initial mirror as S2)

To begin with, please confirm that the CREATE MIRROR SERVER statements for each of the partners, and the arbiter, have been created so that each server has a distinct state information file. (S2 and the arbiter cannot share a file).

The server log for S1 at the time of the first restart might give us a clue about why it restarted. Additionally, I'd like to see the S2 server log that shows what happens when S1 disconnects - it should hopefully show that it was synchronized and takes over as primary (including any previous logging that shows its synchronizing/synchronized state).

It might also be useful to see the mirroring debug log. This log can be setup for each server as a part of the mirror definition as follows: "CREATE MIRROR SERVER <mirror-server-name> AS PARTNER ... logfile = 'name.mlog' ". (More details for setting this option are in the documentation: http://dcx.sap.com/index.html#sa160/en/dbreference/create-mirror-server-statement.html*d5e43395 )

Former Member
0 Kudos

There is no mention of a state file for the arbiter in http://dcx.sybase.com/index.html#sa160/en/dbadmin/database-mirroring-tutorial-setup.html

Execute following SQL statement to define the arbiter server for the database mirroring system:

CREATE MIRROR SERVER demo_arbiter AS ARBITER connection_string ='SERVER=demo_arbiter;HOST=localhost:6870';

Set the authentication string for the database. Execute the following statement:

SET MIRROR OPTION authentication_string='abc';

Former Member
0 Kudos

Yes, you are correct. The arbiter state file is specified on the command line using the -xf flag.

Former Member
0 Kudos

Arbiter uses unique file

Other parameters (IP Addresses/ports ficticious) SELECT * FROM "SYS"."SYSMIRRORSERVEROPTION

connection_string,SERVER=mirror_ks_primary;HOST=1.23.49.98:11,1.88.255.27:12 connection_string,SERVER=mirror_ks_mirror;HOST=1.23.249.98:11,1.88.255.27:12 connection_string,SERVER=mirror_ks1;HOST=1.23.49.98:11 state_file,c:\\server1\\ks1.state connection_string,SERVER=ks_arbiter;HOST=1.88.255.27:10 connection_string,SERVER=mirror_ks2;HOST=1.88.255.27:11 state_file,c:\\server2\\ks2.state logfile,c:\\server1\\ks1.log logfile,c:\\server2\\ks2.log preferred,NO preferred,YES

Former Member
0 Kudos

There are inconsistencies in this set of parameters - but I'm guessing that they're typos in the output since you've changed the IP Addresses and ports. (For example, the HOST strings for mirror_ks_primary and mirror_ks_mirror must match, and the HOST string for mirror_ks_primary/mirror does not have the correct port according to the mirror_ks2 definition.)

This does tell us that you've set the "preferred" option for one of the servers (I can't tell with this info, since the server_id column was not included - I'm guessing it's mirror_ks1?) - this should mean that after S1 (mirror_ks1) restarts, the preferred server (S1) should eventually become primary again (which would cause the current primary S2 to restart and become mirror). The console logs should be able to tell you if this is happening.

Accepted Solutions (1)

Accepted Solutions (1)

Former Member

The mismatch is being caused by this checkpoint:

I. 06/29 08:59:11. Starting checkpoint of "ks" (ks.db) at Mon Jun 29 2015 08:59

When you suspend the server like this, and then wake it up, it takes a moment for the connections to notice they've been severed. However, the queued checkpoint kicks in immediately. Thus the server is doing a checkpoint before it realizes it has lost quorum, and so it has a checkpoint that the mirror doesn't. This is an unrecoverable log mismatch.

You can work around this problem by shutting down the server before suspending the machine, using a "ALTER DATABASE SET PARTNER FAILOVER" to force a failover, or moving the primary to a machine that isn't suspended regularly.

Unfortunately, the setup you're using isn't recommended for high availability. You might want to consider using a replication or synchronization solution, such as Mobilink.

VolkerBarth
Contributor
0 Kudos

Thus the server is doing a checkpoint before it realizes it has lost quorum, and so it has a checkpoint that the mirror doesn't.

Is that a behaviour that is considered to be changed in the future?

Former Member
0 Kudos

Thank you. I knew there was some logic. I did think of Mobilink but with over two hundred tables. I also wanted stored procedures updated as well. I wanted full database replication.

VolkerBarth
Contributor
0 Kudos

If both databases should share the same schema, SQL Remote might be easier to set up for replication than MobiLink (though that does not replicate DDL, either - however, it's not that hard to use SQL Passthrough mode to change DDL at the remote site, too).

MarkCulp
Participant

I am uncertain if this will be changed in the future. It is odd that an HA system would contain a server which suspends (or hibernates). The HA system is meant for "High Availability" after all and IMHO suspending one of the computers goes directly against the idea of maintaining the system up 24x7.

We will add to the backlog a task to investigate a solution. Until such time that a solution is implemented (and if one is ever implemented), a work-around (manual fail-over before suspending) has been provided, and we can add a note to the documentation.

VolkerBarth
Contributor
0 Kudos

It is odd that an HA system would contain a server which suspends (or hibernates).

Yes, I certainly agree on that point and your further arguments. It's just that I was not sure whether that "checkpoint problem" could also happen when a server that acts as primary goes down unexpectedly (say, the server is shutdown accidentally or the like) - then it would be bad if a log mismatch would happen.

Former Member

This checkpoint problem cannot happen if the server goes down due to machine shutdown, process kill, or server shutdown. It is unique to the process being suspended then resumed. Or at least it should be :).

Answers (1)

Answers (1)

Former Member
0 Kudos

If the local machine Windows 7 "goes to sleep" as well as the Router/Network - could this cause a log file mismatch?

When I stopped working at 10:20 approx. last night I was still connected to the local primary. I see a switchover last night after working the whole day on the system without an issue. Logging in this morning at 08:59 I am connected to the remote and the local server cannot start requests a new copy of the files Surely a High Availability system must be able to recover from such an outage!!

The local log (annotated) is attached below

S1 log (local)

S1 log (local)

I. 06/28 22:13:30. 117
I. 06/28 22:13:32. BPT
I. 06/28 22:13:32. 648.0000
I. 06/28 22:13:44. BPT
I. 06/28 22:13:44. 648.0000
--------------------------------->> end of days work local primary
Local/Router power down over night so a switchover make sense
---------------------------------->>wake up thr machine
I. 06/29 08:59:11. Starting checkpoint of "ks" (ks.db) at Mon Jun 29 2015 08:59
I. 06/29 08:59:16. Database "ks" mirroring:  disconnected from arbiter "ks_arbiter"
I. 06/29 08:59:32. Database "ks" mirroring:  disconnected from partner "mirror_ks2"
E. 06/29 08:59:32. Database "ks" mirroring:  mirror connection to server "ks_arbiter" failed
----------------------------------------->.event handler tries to notify me of switchover
I. 06/29 08:59:32. dbo.xp_startsmtp() failed: (return code = 102)
I. 06/29 08:59:32. dbo.xp_startsmtp() failed: (return code = 102)
I. 06/29 08:59:32. dbo.xp_startsmtp() failed: (return code = 102)
E. 06/29 08:59:32. Database "ks" mirroring:  mirror connection to server "ks_arbiter" failed
I. 06/29 08:59:32. Database "ks" mirroring:  restarting to determine mirror role
I. 06/29 08:59:33. Finished checkpoint of "ks" (ks.db) at Mon Jun 29 2015 08:59
I. 06/29 08:59:33. Database "ks" (ks.db) stopped at Mon Jun 29 2015 08:59
I. 06/29 08:59:33. Starting database "ks" (c:\\server1\\ks.db) at Mon Jun 29 2015 08:59
I. 06/29 08:59:33. Performance warning: Database file "c:\\server1\\ks.db" consists of 3 disk fragments
I. 06/29 08:59:33. Database recovery in progress
I. 06/29 08:59:33.     Last checkpoint at Mon Jun 29 2015 08:59
I. 06/29 08:59:33.     Database "ks" transaction log offset after last checkpoint is 182175412
I. 06/29 08:59:33.     Checkpoint log...
I. 06/29 08:59:34.     Transaction log: ks.log...
I. 06/29 08:59:34.     Database "ks" transaction log starts at offset 177760696
I. 06/29 08:59:34. Database "ks" mirroring: local status for this server: role=primary, state=synchronized, sequence=6, yielding=N
I. 06/29 08:59:34. Database "ks" mirroring:  determining mirror role ...
E. 06/29 08:59:34. Database "ks" mirroring:  mirror connection to server "mirror_ks2" failed
E. 06/29 08:59:34. Database "ks" mirroring:  mirror connection to server "ks_arbiter" failed
E. 06/29 08:59:34. Database "ks" mirroring:  mirror connection to server "mirror_ks2" failed
E. 06/29 08:59:34. Database "ks" mirroring:  mirror connection to server "ks_arbiter" failed
E. 06/29 08:59:34. Database "ks" mirroring:  mirror connection to server "mirror_ks2" failed
E. 06/29 08:59:34. Database "ks" mirroring:  mirror connection to server "ks_arbiter" failed
E. 06/29 08:59:35. Database "ks" mirroring:  mirror connection to server "mirror_ks2" failed
E. 06/29 08:59:35. Database "ks" mirroring:  mirror connection to server "ks_arbiter" failed
E. 06/29 08:59:35. Database "ks" mirroring:  mirror connection to server "mirror_ks2" failed
E. 06/29 08:59:35. Database "ks" mirroring:  mirror connection to server "ks_arbiter" failed
E. 06/29 08:59:35. Database "ks" mirroring:  mirror connection to server "mirror_ks2" failed
E. 06/29 08:59:35. Database "ks" mirroring:  mirror connection to server "ks_arbiter" failed
E. 06/29 08:59:36. Database "ks" mirroring:  mirror connection to server "mirror_ks2" failed
E. 06/29 08:59:36. Database "ks" mirroring:  mirror connection to server "ks_arbiter" failed
E. 06/29 08:59:36. Database "ks" mirroring:  mirror connection to server "mirror_ks2" failed
E. 06/29 08:59:36. Database "ks" mirroring:  mirror connection to server "ks_arbiter" failed
E. 06/29 08:59:36. Database "ks" mirroring:  mirror connection to server "mirror_ks2" failed
E. 06/29 08:59:36. Database "ks" mirroring:  mirror connection to server "ks_arbiter" failed
E. 06/29 08:59:37. Database "ks" mirroring:  mirror connection to server "mirror_ks2" failed
E. 06/29 08:59:37. Database "ks" mirroring:  mirror connection to server "ks_arbiter" failed
E. 06/29 08:59:37. Database "ks" mirroring:  mirror connection to server "mirror_ks2" failed
E. 06/29 08:59:37. Database "ks" mirroring:  mirror connection to server "ks_arbiter" failed
I. 06/29 08:59:38. Database "ks" mirroring:  connected to partner "mirror_ks2"
I. 06/29 08:59:38. Database "ks" mirroring: partner status for this server: role=mirror, state=synchronizing, sequence=7, yielding=N
I. 06/29 08:59:38. Database "ks" mirroring:  connected to arbiter "ks_arbiter"
I. 06/29 08:59:38. Database "ks" mirroring: arbiter status for this server: role=mirror, state=synchronizing, sequence=7, yielding=N
I. 06/29 08:59:38. Database "ks" mirroring:  synchronizing ...
I. 06/29 08:59:38. Database "ks" transaction log current offset is 182175412
I. 06/29 08:59:38. Database "ks" (ks.db) started as mirror at Mon Jun 29 2015 08:59    
I. 06/29 08:59:38. Database "ks" mirroring:  failed getting log pages
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
E. 06/29 08:59:38. Database "ks" mirroring:  database is not compatible with primary; files must be replaced
E. 06/29 08:59:39. Transaction log does not match log on mirror partner
E. 06/29 08:59:39. Transaction log does not match log on mirror partner
E. 06/29 08:59:39. Transaction log does not match log on mirror partner
I. 06/29 08:59:39. Database "ks" (ks.db) stopped at Mon Jun 29 2015 08:59
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

I. 06/29 08:59:39. Database "ks" mirroring:  disconnected from partner "mirror_ks2"
I. 06/29 08:59:39. Database "ks" mirroring:  disconnected from arbiter "ks_arbiter"
I. 06/29 08:59:39. Database server shutdown due to incompatible files for database mirroring
W. 06/29 08:59:42. TCPIP listener on IP address (::):7771 is exiting
W. 06/29 08:59:42. TCPIP listener on IP address 0.0.0.0:7771 is exiting
I. 06/29 08:59:42. Database server stopped at Mon Jun 29 2015 08:59
MCMartin
Participant
0 Kudos

Are you using synch or asynch mode for the HA system?

Former Member
0 Kudos

The default synchtonous.

VolkerBarth
Contributor
0 Kudos

Just to understand:

If the local server should act as preferred primary server but will go down regularly (during a planned power-down at night), then wouldn't it be appropriate to force a FAILOVER (via "ALTER DATABASE SET PARTNER FAILOVER") and then to stop the local database server (via the dbstop utility) just before the power-down? (Note: I don't claim that this should be necessary for a successful switchover, it would just be clearer IMHO if the power-down is a planned one...).

Does the console log of the local server show that the server has been stopped deterministically (i.e. had a clean shutdown)?

As others have already asked: What do the console logs of the partner and arbiter server show around the time the local server goes down (they should state that a switchover has taken place)? And what do they show around the time the local server tries to re-connect?

Former Member
0 Kudos

I process during the day (to the local primary) and use the remote (mirror) for web access, so the web db gets updated automatically (it behaves as a backup db as well). The web server is on a vps (lower performance machine and is not yet live) and making it the primary has performance implications (my next issue once this one is resolved is a degradation in performance one). I was not aware of the FAILOVER command which I will try tonight. Console logs for Arbiter/Remote indicate a normal switchover in the evening and an attempted reconnect by the local in the morning (which fails due to log mismatch).

Breck_Carter
Participant

> it behaves as a backup db as well

A High Availability secondary server should not be relied upon to act as a backup. This is discussed in several Help topics; e.g. Regular back ups are still required.

Breck_Carter
Participant
0 Kudos

What do the console logs of the partner and arbiter server show around the time the local server goes down

Yes, please show us all three logs.