on 2019 Sep 20 2:24 PM
We have one remote site that has stopped updating the consolidated database (all other sites are working fine)
I can see in the log that it is 'receiving' but not applying. I have restarted the remote service but makes no difference. Any pointers would be appreciated. I see no other errors in the log file.
Thanks
ASA 17.0.9.4803
I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032720128655-032721802634-56) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032721802634-032721812435-0) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032721812435-032721812435-0) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032721812435-032721854892-1) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032721854892-032721882669-0) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032721882669-032721903257-0) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032721903257-032721919604-0) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032721919604-032721938307-0) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032721938307-032721938307-0) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032721938307-032721977698-1) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032721977698-032722005277-0) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032722005277-032722095116-0) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032722095116-032722095116-0) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032722095116-032722095116-1) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032722095116-032722095116-2) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032722095116-032722095116-3) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032722095116-032722095116-4) I. 2019-09-20 17:40:24. Received message from "STORE19" (6-032722095116-032722095116-5)
The consolidated may be waiting for a missing message, it needs to apply messages in the same sequence as the initial transactions. Does the dbremote log say anything at the end of the receive section about missing messages from "STORE19"? The 6- indicates the consolidated has already asked for resends from this remote 6 times, so this has happened before.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I guess I was just a bit impatient ! - A couple of hours later the remote re-sent all the messages again and applied them successfully (with a 7- count) - This does happen from time to time but I'm not sure why ? So this particular site it has happened 6 times over the past year.
Often, a very large multi-part message can be the culprit. Imagine the consolidated needs to send 2000 messages because of a large transaction in the database (I can see in your output that you had a 56 part message, which is not too large). When the remote applies messages, it needs to read all 2000 messages, or it will say there is a missing message, because it needs to find all the messages for the multi-part message to apply it. Now imagine it takes 55 seconds for the consolidated to generate the messages, but dbremote runs every minute on the remote database in send then close mode. dbremote would need to start at a time when all 2000 messages were in the message system, so in this case, it would fail to find all the messages 11 times out of 12.
There are switches on the dbremote command line to manage patience when running in continuous mode, which is why I asked for the dbremote command line and the definition of the remote/consolidated user in the database previously.
Reg
We run both the consolidated and remote in continuous mode. On the consolidated we have
-c "dbn=CONSOL_IF_2017;uid=dba;pwd=sql" c:\\sdata_log -rd 5 -rp 3 -o c:\\consol_r.log
On the remote we have
-n INSTORE_SERVER c:\\sdata\\store21.db -n STORE_DB
We don't see any sign of the 'missing messages' in the log file ?
Mark
It must be something else then. There is certainly no way to figure out the root cause of an issue with only a snipet of a sql remote log showing messages being received.
Actually we do get a 'missing message' log - but this only seems to happen after the nightly backup has taken place. Once this happens the remote sends all it's messages again and this time they are applied.
Not sure why it takes up to 10 hours to do this ? Surley it should send a 'resend' straight away ? We do have many multipart message. (I would send full logs but not sure how I attach them ?)
I. 2019-09-24 21:16:08. Scanning logs starting at offset 01404832129209 I. 2019-09-24 21:16:08. Transaction log "c:/sdata_log/consol.log" starts at offset 01404918062459 I. 2019-09-24 21:16:08. Processing transaction logs from directory "c:\\sdata_log" I. 2019-09-24 21:16:08. Deleted old log file "c:\\sdata_log\\190923AA.LOG" I. 2019-09-24 21:16:08. Transaction log "c:\\sdata_log\\190924AA.LOG" starts at offset 01404118053140 I. 2019-09-24 21:16:08. Current timeline GUID: ac57040e-da21-4295-b94a-598970133536 I. 2019-09-24 21:16:08. Current timeline UTC creation time: 2017-12-07 17:27:59.795000+00:00 I. 2019-09-24 21:16:08. Current transaction log GUID: 00000000-0000-0000-0000-000000000000 I. 2019-09-24 21:16:08. Processing transactions from transaction log "c:\\sdata_log\\190924AA.LOG" I. 2019-09-24 21:16:11. Transaction log ends at offset 01404918062459 I. 2019-09-24 21:16:11. Processing transactions from active transaction log I. 2019-09-24 21:16:11. Sending message to "STORE01" (0-01404897987470-01404918066703-0) I. 2019-09-24 21:16:11. Sending message to "STORE08" (0-01404898994264-01404918066703-0) I. 2019-09-24 21:16:11. Sending message to "STORE17" (0-01404915838279-01404918066703-0) I. 2019-09-24 21:16:11. Sending message to "STORE11" (0-01404905359172-01404918066703-0) I. 2019-09-24 21:16:11. Sending message to "STORE18" (0-01404915840191-01404918066703-0) I. 2019-09-24 21:16:11. Sending message to "STORE19" (0-01404915839102-01404918066703-0) I. 2019-09-24 21:16:11. Sending message to "STORE14" (0-01404915839974-01404918066703-0) I. 2019-09-24 21:16:11. Sending message to "STORE15" (0-01404915841770-01404918066703-0) I. 2019-09-24 21:16:11. Sending message to "STORE05" (0-01404898992495-01404918066703-0) I. 2019-09-24 21:16:11. Sending message to "STORE99" (0-01404832129209-01404918066703-0) I. 2019-09-24 21:16:11. Sending message to "STORE21" (0-01404915842386-01404918066703-0) I. 2019-09-24 21:16:11. Hovering at end of active log E. 2019-09-24 21:18:22. Missing message from "STORE21" (1-021625109944-021628825768-150) I. 2019-09-24 21:18:23. Sending message to "STORE21" (0-01404918066703-01404918066703-0) == Log after new messages arrive - these are subsequently applied I. 2019-09-24 21:23:50. Received message from "STORE21" (1-021716762585-021716772792-0) W. 2019-09-24 21:23:50. Not applying operations with old resend count I. 2019-09-24 21:23:50. Received message from "STORE21" (2-021625109944-021625109944-0) I. 2019-09-24 21:23:50. Received message from "STORE21" (2-021625109944-021625109944-1) I. 2019-09-24 21:23:50. Received message from "STORE21" (2-021625109944-021625109944-2)
We have had the best luck running dbRemote in batch mode. On the consolidated database which in our case only handles replication we run dbRemote in receive, then perform an incremental backup and then dbRemote in send. We are modifying this slightly in the near future to do another incremental backup after dbRemote runs in send. This is controlled with a task scheduler.
On the remotes we run in batch mode first receive then in send and do a incremental backup when complete. On the remotes this is controlled with a scheduled database event.
If you have a lot of multi-part messages you may want to increase your message size but remember you have to coordinate across remotes. We also use file based messaging.
These hooks are very powerful and your right we could use this but I think a database event is a better solution. We use booth the send and receive hooks already, receive does some data processing to alert users when certain types of data arrives, send creates a replication status report and emails it via the xp_ procedures.
The consolidated doesn't use events so the local dba's can control the processes without having to log into the database, for example when a new remote is being extracted. Our remotes don't have dba's part of the beauty of SQLA.
User | Count |
---|---|
66 | |
10 | |
10 | |
10 | |
10 | |
8 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.