on 2016 Aug 24 6:47 AM
Hey everyone,
We're currently struggling with some issues in our productive product(s) and a bit of help would be much appreciated.
The symptom
Seems quite simple at first : the synchronization starts successfully, the upload occurs, the download is prepared, it downloads and put rows into the transaction log and finally we're getting a
caught UljException : remoteID=37afd978-648e-11e6-8000-b1e0a102c04c, errorCode=-1305, params=86,404,0, sqlOffset=-1, message=UltraLiteJ Error[-1305]: MobiLink communication error – code: 86, parameter: 404, system code: 0com.ianywhere.ultralitejni16.implementation.JniException: UltraLiteJ Error[-1305]: MobiLink communication error – code: 86, parameter: 404, system code: 0
on one table (must say, this table can get pretty big after a few days).
The error occurs on several devices that until then successfully synchronized.
The context
We're using :
The product runs big amount of data and in our case, the device has come to a point where it needs to synchronize 100K+ rows on the table on which the error is generated.
We have a super strong Wifi connection. Thus the problem is not coming from a client losing connection.
The logs state that between the moment we start syncing a table and the moment the error pops is quite random. I've seen values as ~60 minutes, ~35 minutes and... 4 minutes.
And the question
I thank you in advance for your support,
Best regards.
Edit (2016-08-26): here's an extract of the logs
Mobilink server logs
------------------------------------
I. 2016-08-23 11:07:54. <4912875> Request from "UL 16.0.2193" for: remote ID: 37afd978-648e-11e6-8000-b1e0a102c04c, user name: damdoumdada, version: R12
I. 2016-08-23 11:07:56. <4912875> The current synchronization is using a connection with connection ID 'Unknown'
I. 2016-08-23 11:07:56. <4912875> User 'damdoumdada' has been successfully authenticated against LDAP server 'dimdoumdimdim'
I. 2016-08-23 11:07:56. <4912879> COMMIT Transaction: Begin synchronization
W. 2016-08-23 11:07:56. <4912875> [10017] The consolidated and remote databases have different timestamp precisions. Consolidated database timestamps are precise to 6 digit(s) in the fractional second while the remote database timestamps are precise to 2 digit(s)
I. 2016-08-23 11:07:56. <4912875> COMMIT Transaction: Begin synchronization
I. 2016-08-23 11:07:56. <4912875> COMMIT Transaction: Upload
I. 2016-08-23 11:07:56. <4912875> COMMIT Transaction: Prepare for download
W. 2016-08-23 11:08:25. <4912875> [10116] The earliest transaction has been running for '1792' seconds
I. 2016-08-23 11:08:25. <4912875> Sending the download to the remote database
I. 2016-08-23 11:08:27. <4912875> COMMIT Transaction: Download
I. 2016-08-23 11:08:27. <4912875> COMMIT Transaction: End synchronization
E. 2016-08-23 11:10:48. <4912875> [-10279] Connection was dropped due to lack of network activity
I. 2016-08-23 11:10:48. <4912875> Synchronization complete
I. 2016-08-23 11:10:48. <4912875> PHASE: sync_request: 13
I. 2016-08-23 11:10:48. <4912875> PHASE: receive_upload: 2068
I. 2016-08-23 11:10:48. <4912875> PHASE: get_db_worker: 0
I. 2016-08-23 11:10:48. <4912875> PHASE: connect: 5
I. 2016-08-23 11:10:48. <4912875> PHASE: authenticate_user: 228
I. 2016-08-23 11:10:48. <4912875> PHASE: begin_sync: 27
I. 2016-08-23 11:10:48. <4912875> PHASE: apply_upload: 44
I. 2016-08-23 11:10:48. <4912875> PHASE: prepare_for_download: 24
I. 2016-08-23 11:10:48. <4912875> PHASE: fetch_download: 30615
I. 2016-08-23 11:10:48. <4912875> PHASE: end_sync: 14
I. 2016-08-23 11:10:48. <4912875> PHASE: send_download: 141472
I. 2016-08-23 11:10:48. <4912875> PHASE: wait_for_download_ack: 0
I. 2016-08-23 11:10:48. <4912875> PHASE: get_db_worker_for_download_ack: 0
I. 2016-08-23 11:10:48. <4912875> PHASE: connect_for_download_ack: 0
I. 2016-08-23 11:10:48. <4912875> PHASE: nonblocking_download_ack: 0
Near these logs, we can find :
E. 2016-08-23 11:08:46. <4913034> [-10117] Stream Error: Unable to continue unknown HTTP session
I. 2016-08-23 11:08:46. <Main> Synchronization connection with connection ID 'Unknown' has been disconnected
I. 2016-08-23 11:08:46. <Main> COMMIT Transaction: End connection
Request clarification before answering.
We have an ml-session-id header on every HTTP request that the server uses to associate the different HTTP requests of a sync together. A 404 probably means that the server received an HTTP request with an unknown session id. There're two likely causes of this:
If you're going to be doing large downloads, you also might want to look into the restartable downloads feature. With that, the server will keep failed download around and they can be resumed even after a disconnect. The download information isn't shared across servers in a farm, so if you turn on resumable downloads with more than one server, you'll have to make sure that all requests from a remote go to the same server.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
If you are seeing this error message then the MobiLink connection to Oracle may be blocked by a lock held by some other connection, perhaps another MobiLink connection, perhaps not.
(I think the word "clocking" should be "blocking" in the "Probable cause" below)
Connection ID '%1' is currently blocked by connection ID '%2' for %3 seconds on %4
Error code 10117
Error constant CONN_BLOCKED_ON
Parameter 1 A MobiLink server database connection ID.
Parameter 2 Another database connection ID.
Parameter 3 Time in seconds.
Parameter 4 Table name or database operation that is currently blocked.
Probable cause
The MobiLink server will detect any its database operations that take longer than a given time and reports the connection ID that is currently clocking the MobiLink server connections. To avoid this problem, try to reduce the transaction open time.
FWIW the MobiLink server log is the very first place to look for diagnostic information about MobiLink errors, because that's where you will find what you're looking for 99.99% of the time.
The other 0.01% of the time, helpful information will be found elsewhere; e.g., on the client side, Oracle database server, etcetera.
Of course, I am exaggerating. The real number is only 99.9% 🙂
Well, IMHO, both are totally different messages:
Each error has a numeric error code. Error codes are always negative. Warning codes are always positive. Zero means no error or warning.
As you are seeing a -10117 error but do not get the 10117 warning, I guess a blocked connection is not necessarily an issue here...
FWIW, I don't find anything about the -12079 error mentioned by Bill in the docs. Do you notice a -10279 error ("Connection was dropped due to lack of network activity") instead?
Sorry, I just tried to clear up the error codes... - can't tell on the ML problem itself.
However, Bill – who is the expert – has raised the question whether you are using a ML server farm. If not, then in my limited understanding, point 2 from his answer would not apply here. So do you use several ML servers?
For further diagnosis, it might be helpful if you could include a ML server log snippet showing these errors and the lines before/after...
User | Count |
---|---|
33 | |
22 | |
17 | |
8 | |
5 | |
5 | |
4 | |
4 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.