on 2014 Dec 03 9:02 AM
Hi,
we are in a situation where we have a few publications and each mobilink publication works as expected.
Though, when initially synchronizing one of the publication it takes about half an hour before the download of data starts. The second time around it just goes really fast and other publications that download even more data work quite fast as well.
We have seen something like this before when database queries are really slow on the consolidated database. Unfortunatly our DBA cannot see any problems here this time.
We are going to have someone over from Oracle to check the database for problems...
Is there anything else that we should take in account when investigating something like this?
Things we have checked: we've done some network tests to see if things are slow here... but it appears not the consolidated database (oracle) has been monitored for slow queries and locks. non have been found. Load on the database server is very low.
thank you,
Oracle consultant has done some checks and could confirm what I was seeing in mobilink log. 3 slow download_cursors => 3 slow queries.
The reason why the query was going fast when I was running it manually is because it used a different executionplan because the query in the download_cursor was just a tiny bit differently written (extra spaces or something).
our DBA had noticed a lot of IO but was not sure this was a problem. The oracle consultant confirmed that this really is a problem and the amount of IO was caused of not having a big enough memory Buffer for oracle. Yesterday we added more RAM memory and after half a day of synchronizations everything has come up to speed.
Everyone thanks for the feedback. Upvotes for all !
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
If the SQL in the three slow download_cursors runs quickly when executed directly then it suggests a concurrency problem with the scripts (since the MobiLink server has many connections that are simultaneously active). For example, if in the MobiLink Monitor you see long phases that end around the same time (though starting at different times) it indicates that the first one is blocking the others.
FYI with v16, the ML server detects such blocking and the ML Profiler highlights it.
We've also seen syncs against Oracle slow to a crawl due to Oracle "background" processes that become bottlenecks under high load. For example, a customer found that under high sync loads Oracle DB archive logs would fill up, causing a checkpoint and swapping of archive logs, during which synchronizations could not proceed, and there appeared to be no load on the Oracle server. Other culprits might include Oracle streams, database flashback, or other administrative processes.
We've also found deadlock warnings in the MobiLink server log when Oracle DBAs had said there was no deadlock.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
There's a couple of things you need to do to find out where the issue is:
Connect the Mobilink Monitor to your Mobilink instance so that you can see whether the delay is in the fetch of the download data, sending it to the remote, or elsewhere.
Check that your timestamp columns (if you're using timestamps) are indexed properly.
Note that an initial synchronization will download all data to the remote, and the remote will then have to insert all the changes into it's DB, indexing as it goes. I have a publication that takes over 10 minutes for an initial sync to be applied after download, so it's not unusual.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Note that in version 16, the tool that was previously called the MobiLink Monitor was rebranded as the MobiLink Profiler.
FYI, the ML Profiler is more than just a rename. For example, it allows finer grained timing (to the level of sync events or scripts) and sample-based profiling. It also detects blocking in the consolidated database.
This is probably caused by either having -cmax (or its alias -cm) set too low, or having a long running download script. Look for warning 10082 in your log file. If it shows up, then bump up -cmax. If it doesn't show up, then you probably have a slow query. There isn't a good, automatic way to find which query is the problem in 12. You can try whatever diagnostic tools Oracle provides, or you can try to infer timings through the MobiLink server log. You'll want to turn on either -vc or -vt to make the server print the SQL that it's executing.
As stated above, you can try adding -vt to your Mobilink server instance to log the actual SQL that is being used. Do that on a known client and when you've observed the issue, copy all of the SQL out of the log file. Run each query against your consolidated database, and you should get an idea of which script is taking so long to execute.
Ow, I thought I had to add -vt on the configuration of the mobilink service... thanks for the tip.
Edit: looks like I can't use -vt on the client. tried to use -vc but not seeing any queries on the client side (or I don't know where to look)... on the server logs I do see extra information but only what download scripts are being called at what time. I guess I'll just have to investigate those further.
some extra feedback: After adding -vt to the mobilink configuration on the server. I noticed 3 slow download-cursors. I come to this conclusion because it was taking several minutes before moving to the next download_cursor in the log file. So I looked in to the sql that is behind it.... and executed it directly on the consolidated database and it did not take more then half a second to get the results => 410 row
User | Count |
---|---|
75 | |
10 | |
10 | |
10 | |
10 | |
9 | |
8 | |
7 | |
5 | |
5 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.