cancel
Showing results for 
Search instead for 
Did you mean: 

Mobilink Sync Errors After Master DB Recovery

Former Member
2,230

I am responsible for a SQL Anywhere Mobilink system that I did not write, so excuse my ignorance.

This past, weekend a hard drive crash took down our main database which has a set of tables synced to dozens of user phones -- an iphone app is used to gather customer equipment data on-site. We use mobilink to do the sync. We have a database backup at the end of every weekday so we have not lost any data from the main server but how can I get the phones to resume syncing? Some of them have synced up but I am getting many errors (794 among others). Is there a way to have all the distributed users sync back to the master? I am willing to lose any changes on the phones to get things working again.

This surely must be a scenario your product was designed to handle. Right?

Accepted Solutions (0)

Answers (2)

Answers (2)

Summarizing information from the comments...

A system timeline, with consolidated database states Sn, restore XX, and lowercase remote changes uploaded:

condb   S0..S1..S2..S3...XX...S2..S2..S4
-           ^       ^             ^
remote1 ....a.......c.............d
-               ^                     ^
remote2 ........b.....................e

Remote1 will fail to synchronize its changes d, because it realizes that the consolidated database has lost its changes c. What can it do at this point? It stops to protect data integrity and external intervention is required. Options include:

  1. reset remote1's state on the server so it will synchronize despite the lost data. This will not work well if d depends on c or other post-S2 changes!
  2. obtain remote1's database and manually recover new data, or perform a sync that uploads all data. Whether upload-all is safe or not depends on how the sync scripts and condb are designed; it could overwrite new data with old.
  3. recreate the remote database and sync to obtain a fresh consistent copy of the data. If done without #2, changes c and d are lost.

Option #3 is the only sure way to achieve consistency. It's not automatic because it loses data.

Back to the timeline, remote2 will synchronize successfully, because from its point of view everything is consistent. It did not sync within the window of loss.

Former Member
0 Kudos

So the Mobilink system has no way to recover and reset the sync? That would be an insane design and a worthless product.

Our database system is being hit all the time from different apps and sources. Returning to a state that it was in several days ago at the exact moment of a disk crash is not possible.

chris_keating
Product and Topic Expert
Product and Topic Expert
0 Kudos

Without knowing the specifics of what MobiLink server is reporting, I am reluctant to advise next steps. Certainly, you can reset the remote if that is the appropriate action. See ml_reset_sync_state system procedure. Please read and understand what this does before using it.

The point I was marking with respect to a recovery is that if there have been synchronizations since the point in time of the database that has been put into the system, those remotes will no longer be able to sync without some intervention. This is not ideal. I take it that the database that was put back into production was not restored to the point in time of the hard drive failure. If so, there will likely be remotes that cannot sync if they have sync'd at a time after that restore point i.e., you have lost the sync status that synchronization uses to ensure data consistency between the remote and consolidated. You can reset the sync state to force that remote to synchronize.

chris_keating
Product and Topic Expert
Product and Topic Expert
0 Kudos

Your log suggests that the reset procedure in theory should work with the affected users.

chris_keating
Product and Topic Expert
Product and Topic Expert
0 Kudos

Please hold off on the use of ml_reset_sync_state. We are looking at options on the remote.

chris_keating
Product and Topic Expert
Product and Topic Expert

Please note that this is really a salvage because data has been lost at the consolidated - the data might simply be related to the current sync state of affected remotes or could extend to data that should be in the consolidated and may no longer exist on the remote. Given the flexibility of the scripts that are used for synchronization, MobiLink would not be able to determine what rows should look like - that is left to the consolidated and its recovery gear. This is not a defect in design. It is reasonable that MobiLink expects that the consolidated database is capable of recovery with no loss of data. That did not happen in this case.

You need to now decide 1) Is there data on the remote that may be important to the application and efforts should be made to get that data into the consolidated? If so, this may be a manually effort for the affected remotes. -or- 2) Are you willing to lose that data? If so, options include resetting the remote status or recreating those remotes by sync'ing the database from an empty state - in that case, keep the existing remote as you may be able to manually re-enter the information that exists only in the remote.

You may want to work with technical support to go through the details if you are not familiar with MobiLink.

The design is not insane and the product is not worthless.
Operating it without an administrator who has the required skills and understanding .is. insane. This is probably not your fault, but it is one. If you get out of this with the help from somebody, you're still left with a system that, in the best possible situation, works, and you don't know why.
It is mandatory and critical that a node in a system of communicating databases is restored to a point where no communication with another database is missing. This is particularly critical for any consolidate database which communicates with all other nodes. All supported MobiLink consolidate platforms have mechanisms to achieve this, depending on the failure scenarios you need to cover and the infrastructure required to implement the required countermeasure. Determining this requires solid knowledge (not at rocket science level) about the way MobiLink works and the administration of the consolidate database.
I wish you good luck that somebody here or from TechSupport can help you out of this. If anybody can, Chris is a hot candidate. Whatever will be the outcome of this crisis, I highly recommend that you go to your manager and insist that you and probably somebody else get trained on the subject. If [s]he compares the cost for such a training with the risk of not having somebody available with the skills to handle a crisis, [s]he will be responsible for the decision.

Just my €.02...

Volker

VolkerBarth
Contributor
0 Kudos

It is mandatory and critical that a node in a system of communicating databases is restored to a point where no communication with another database is missing.

Just to add: Understanding such as system does also include the knowledge that in the (hopefully rare) case the mentioned requirement cannot be fulfilled (i.e. the consolidated database cannot be restored up do the point of the last sync), any of the remote databases that have sync'ed with the consolidated after that point may have lost data and/or may need to be resynchronized. That's what Chris and Tim have explained.