cancel
Showing results for 
Search instead for 
Did you mean: 

dbmlsync 12 service shuts down after connection to remote db fails

Former Member
0 Kudos
10,260

We are seeing the following undesirable behavior under Mobilink 12.0.1.3942 with SQL Anywhere 12 as the remote (and consolidated) database:

  1. dbmlsync is running as a Sybase service, on a schedule.
  2. dbmlsync is configured to use a local DSN to connect to the remote database. The DSN is configured to connect using TCP/IP (i.e. not shared memory).
  3. A network error occurs such that dbmlsync is unable to connect to the remote database.
  4. [Here's the bad part:] The dbmlsync service stops, and the Windows application event log shows this as a normal shutdown of the service, not a crash.

Here is the relevant extract from the dbmlsync log:

I. 2014-03-26 04:26:09. Next sync scheduled on Mar 26 04:27:00am
I. 2014-03-26 04:26:09. Log scan starting at offset 0126193676807
I. 2014-03-26 04:26:09. Hovering at end of active log
E. 2014-03-26 04:27:45. SQL statement failed: (-832) Connection error: Timeout occurred while waiting for connection response
E. 2014-03-26 04:27:53. SQL statement failed: (-143) Column 'table_name' not found
E. 2014-03-26 04:27:53. Error while executing hook procedure sp_hook_dbmlsync_sql_error.
I. 2014-03-26 04:27:53. ROLLBACK
E. 2014-03-26 04:27:53. Unable to connect to remote database.

Note: logging stops at that point; timing coincides with the event log for the "normal" shutdown of the service.

Has anyone seen this before? Seems like a bug to me...

Thanks, Bob

VolkerBarth
Contributor
0 Kudos

Are you sure the error/shutdown is not due to the fact that the hook procedure "sp_hook_dbmlsync_sql_error" seems to produce an error itself?

Do you use the sp_hook_dbmlsync_end hook procedure to trigger a restart?

Breck_Carter
Participant
0 Kudos

Does this happen repeatedly, or is it a one-time thing?

What does the dbmlsync command line look like? Please show us more of the dbmlsync diagnostic log... maybe turn up the verbosity a bit.

Former Member
0 Kudos

Here are more details as requested:

  • Yes we do have code in the hook procedure "sp_hook_dbmlsync_sql_error". It attempts to get values from #hook_dict, construct a string, and insert the string into a table in the remote database. So that would produce an error, but would that be expected to cause dbmlsync to shut down?

  • We have similar code in the "sp_hook_dbmlsync_end" hook procedure, but no code for restarting the service or dbmlsync.

  • This is reliably reproducable under the stated configuration.
  • Here is the dbmlsync command line: -c DSN=SM_3;uid=dba;pwd=sql; -n sm3_replication -vnrs -o c:Replication_Logsmlclient_SQLANYs_SM_CORSE_DB_SVC_SM_3.txt -os 3m -e sch=EVERY:0000:01

  • Unfortunately my logs are in French as my testing is on a French system. Here is the verbose log for the failed cycle: I. 2014-03-27 12:37:10. Connexion à la base de données distante E. 2014-03-27 12:37:16. Echec de l'instruction SQL : (-100) Le serveur de base de données est introuvable I. 2014-03-27 12:37:16. Ouverture de la connexion à la base de données distante pour appeler un hook d'erreur/de journal. E. 2014-03-27 12:37:21. Impossible d'ouvrir la connexion pour les hooks d'erreur/de journal. Les hooks d'erreur/de journal ne seront pas appelés. E. 2014-03-27 12:37:21. Impossible d'établir la connexion à la base de données distante. I. 2014-03-27 12:37:21. Ouverture de la connexion à la base de données distante pour appeler un hook d'erreur/de journal. E. 2014-03-27 12:37:26. Impossible d'ouvrir la connexion pour les hooks d'erreur/de journal. Les hooks d'erreur/de journal ne seront pas appelés.

  • Here is the log from the Windows Application Event Log: System Provider [ Name] SQLANY64 12.0 EventID 1 [ Qualifiers] 0 Level 4 Task 0 Keywords 0x80000000000000 TimeCreated [ SystemTime] 2014-03-27T12:37:26.000000000Z EventRecordID 289769 Channel Application Computer WIN-L2PPG30TFKD Event Data SQLANYz_dbmlsync_SM3_Remote_svc Arrêt normal du service SQLANYz_dbmlsync_SM3_Remote_svc

  • Platform is Windows 2008 Server R2

  • Testing is being done on virtual machines (Oracle VM VirtualBox)

VolkerBarth
Contributor
0 Kudos

May you be allowed to modify the language SQL Anywhere will use for its UI/logging, i.e. by using the DBLANG tool?

(That's just me, the Canadian SQL Anywhere experts may be familiar with French...)

VolkerBarth
Contributor
0 Kudos

So that would produce an error, but would that be expected to cause dbmlsync to shut down?

I think so - to cite from the docs:

Ignoring errors

By default, synchronization stops when an unhandled error is encountered in an event hook procedure. You can instruct the dbmlsync utility to ignore these errors by supplying the -eh option.

Former Member
0 Kudos

Volker, thanks for the info about the -eh option. In my case though, it doesn't change the behavior.

Accepted Solutions (0)

Answers (1)

Answers (1)

jeff_albion
Advisor
Advisor

A failure to connect to the remote database is considered a fatal error to dbmlsync and it will stop in a normal fashion. Even if you were able to ignore the specific hook errors, dbmlsync will continue to try and connect to the database for dbmlsync's operational purposes, and dbmlsync will still need to shut down.

Why is dbmlsync getting disconnected from the remote database server? The network between dbmlsync and the remote database should be reliable - if it is not reliable, it shouldn't be used. The architecture model should be that dbmlsync is as close to the remote database as possible - if the network is unreliable, it may have to reside on the same machine.

The network between the MobiLink server and the remote is expected to be unreliable and MobiLink / dbmlsync should be able to handle this situation automatically. (However, dbmlsync still may prematurely shut down in a service mode unless you have applied CR #720564, in 12.0.1.3806 or later).

Former Member
0 Kudos

Jeff, actually the dbmlsync service is located on the same machine as the database server. We found the problem when we had it configured to use a DSN that connected using TCP/IP (so that we could specify both the local server and the standby/backup server host addresses in the DSN). I realize we can resolve this by using a direct connection (i.e. -c Server=xxx;DBN=xxx;) and that's the plan.

However I would still argue that the system should be more tolerant of local network errors, which in my experience are usually transitory. I understand that the synchronization process must stop when it encounters this sort of error, but why stop the service that attempts to synchronize every x minutes? It could very well succeed on the next attempt.

VolkerBarth
Contributor
0 Kudos

While I can follow your reasoning, wouldn't it be an option to make the dbmlsync service restart automatically itself?

In case you use a Windows service (that's my understanding of your original question), AFAIK you can control what should happen if the service stops - and a (delayed) restart is one option... - cf. this TechNet article for Server 2K3

Former Member
0 Kudos

Volker, it is my understanding that the recovery options in the Windows service manager take effect if the service crashes. Would they work in the case of a normal shutdown, which is how this appears to be registered?

VolkerBarth
Contributor
0 Kudos

Sorry, I don't know that, simply as most services do not stop on their own - but you can certainly test the behaviour...

A different approach would be to use a task that calls a batch with a loop that would start dbmlsync and wait until it stops and would restart it after a short delay. Something similar is sometimes used to make run SQL Remote "endlessly".

Former Member
0 Kudos

Yes, we do that with our live backups.

jeff_albion
Advisor
Advisor
0 Kudos

Ah, if this is on the same computer then I would definitely recommend using shared memory - it will be more efficient.

  • Testing is being done on virtual machines (Oracle VM VirtualBox)

This "localhost TCP/IP disconnection" problem sounds very similar to other issues I've heard from customers on VMs. See: "Connection terminated abnormally".

If you were interested in debugging this situation further: which address are you providing to the client when it's connecting locally (if at all?). If you're using "localhost", does using "127.0.0.1" seem to solve the issue?