cancel
Showing results for 
Search instead for 
Did you mean: 

SQL Remote: Messages with duplicate log operations

VolkerBarth
Contributor
0 Kudos
3,415

I've finally migrated a 8.0.3 consolidated database to 12.0.1.3324, and nearly all is well.

The one strange thing happens with two 12.0.1.3324 remotes that I use to test with.

They have twice failed to replicate with the above error, such as:

I. COMMIT  
W. Deleting duplicate message from "cons" (13-01334747425-01334747425-0)  
W. Deleting duplicate message from "cons" (13-01335463391-01335463391-0)  
E. Messages (13-01334747425-01334747425-0) and (13-01334747425-01334811163-0)  
   from "cons" contain duplicate log operations  
I. execute "dbo".sp\\_hook\\_dbremote\\_receive_end

Then the cons answers with a resend, and the resent messages (comprising several files) are then applied without problem. This happens with both v12 remotes.

Strange enough, I have "identical" remotes with 8.0.3 (i.e. with logically identical subscriptions) that do apply the same messages without error (and without need to resend).

AFAIK, the SQL Remote options are similar (e.g. -l is set to the same value - I'm using the DBTools API actually).

I've been using the v12 remotes for a while (since last November), and that kind of error has appeared now and then (and could obviously be resolved by a re-send in all cases). Therefore it does not seem to be bound to the cons having moved from v8 to v12.

Therefore I'd like to ask if there is a known problem with v12 and such errors...


Just to add:

Further analysis shows that this has not happened with 12.0.1.3152 and older versions, and it seems to happen when the remote gets more than one message with the same message header. In the samples, there were several "empty" messages with the following header:

I. Nachricht von "cons" (13-01334747425-01334747425-0) erhalten  
I. Nachricht von "cons" (13-01334747425-01334747425-0) erhalten

This seems to be seen as an error with 12.0.1.3298 (and 12.0.1.3324) whereas older versions do apply messages with identical headers without problem, here from the log of the 8.0.3 remote:

I. Nachricht von "cons" (1-01334747425-01334747425-0) erhalten  
I. Nachricht von "cons" (1-01334747425-01334747425-0) erhalten  
...  
I. Nachricht von "cons" (1-01334747425-01334747425-0) wird angewendet  
I. Nachricht von "cons" (1-01334747425-01334747425-0) wird angewendet

Note 1: Sending messages with identical headers is very common in our setup, as we use SQL Remote in batch mode, and therefore each remote that has not "answered" currently will soon get a "liveness" message for each SQL Remote run in the cons.

Note 2: A remote with 10.0.1.4181 seems to have the same behaviour.

Accepted Solutions (1)

Accepted Solutions (1)

regdomaratzki
Product and Topic Expert
Product and Topic Expert

The problem of dbremote reporting "Deleting duplicate message" has been fixed in the following builds :

v11.0.1.2615, v12.0.0.2691, v12.0.1.3365

We strongly suspect but cannot prove that the "duplicate log operations" is the same problem, but have not been able to reproduce that problem here.

regdomaratzki
Product and Topic Expert
Product and Topic Expert
0 Kudos

Should also have mentioned that the CR number for the fix is 672208.

VolkerBarth
Contributor
0 Kudos

And here's the link. - Thanks Reg, I'm gonna report once an EBF is available.

BTW: Is there any chance to fix this in 10.0.1, too, as the underlying change (CR 655157) has been introduced in 10.0.1.4171, too? I'm aware that 10.0.1 is in limited support now, but ...

VolkerBarth
Contributor
0 Kudos

@Reg: Are there any plans to fix this in 10.0.1, too? I really would like this code line to get fixed, too.

Otherwise, at least I would know whether I will need to change my DBTools structure settings for our 10.0.1 clients.

VolkerBarth
Contributor
0 Kudos

@Reg: Just to confirm that EBF 12.0.1.3389 really has solved this issue.

From the quietness around this answer, I'm going to assume that a fix for 10.0.1 is not planned.

VolkerBarth
Contributor
0 Kudos

Just to add: The new 12.0.1 EBF also prevents the "duplicate log operations" message whereas a similar remote running with the unfixed 10.0.1.4181 still does yield this error message.

Answers (1)

Answers (1)

VolkerBarth
Contributor
0 Kudos

Further analysis shows that this might be bound to the number of worker threads for applying messages:

It occurs that when -w is set to 0 (the default), the messages with identical header are applied correctly (and each message is applied directly after receiving). In contrast, using -w 1 (as our DBTools-API is set to) receives all messages and then starts to apply them asynchronously (which is desired and the obvious effect of using a separate worker thread for applying).

This has worked fine 12.0.0.2566 - 12.0.0.2601 and with 12.0.1.3152 (MR) - and has worked for years with v8.

However, the parallel application seems to lead to problems with 12.0.1.3298 and above, as the applying thread seems to treat messages with identical headers as wrong and then triggers the warning about duplicate messages.

10.0.1.4181 seems to have this issue, too.

When using no separate worker thread (-w 0), each message is received and applied on its own (as long as the order is correct). As the applying code does only treat one message at a time it cannot notice that there are duplicate identical message headers, and therefore does not trigger a warning. As such, using -w 0 seems a valid workaround.

So I would conclude that something has changed in the analysis of message headers in recent builds that causes this issue.

The issue should be reproducible with any SQL Remote setup fo the according builds, if a cons sends several messages to the same remote (during several batch runs) when the log offset in the cons has not changed in-between. That will send messages with identical headers (i.e. the unchanged log offset) to the remote.

VolkerBarth
Contributor
0 Kudos

@Reg: I do not expect many other members here to be able to answer my question (as I guess they would need to have the SQL Remote code available...) - this is just in case your current workload gives you a chance to dive into this - let me know:)

regdomaratzki
Product and Topic Expert
Product and Topic Expert
0 Kudos

Volker, I've been able to reproduce the warnings, and as you point out, it only occurs when the -w switch is used, and in my limited testing it's inconsistent on when the warning is displayed and when the message is applied (timestamps removed in the output below so it all fits nicely on one line).

I. Applying message from "rem1" (0-0000564123-0000564123-0) W. Deleting duplicate message from "rem1" (0-0000564123-0000564123-0) W. Deleting duplicate message from "rem1" (0-0000564123-0000564123-0) W. Deleting duplicate message from "rem1" (0-0000564123-0000564123-0) W. Deleting duplicate message from "rem1" (0-0000564123-0000564123-0) I. Applying message from "rem1" (0-0000564123-0000564123-0) I. Applying message from "rem1" (0-0000564123-0000564123-0) I. Applying message from "rem1" (0-0000564123-0000564123-0) I. Applying message from "rem1" (0-0000564123-0000564123-0)

I still haven't seen the error message from your first post, but I'll look into why the warnings are being printed and see if I can follow a code path that gets me to the error.

I strongly suspect that this behaviour is an unwanted side effect to the change that we put in versions 10.0.1.4171, 11.0.1.2547, 12.0.0.2634 and 12.0.1.3276 :

Originally, if there are any missing messages, SQL Remote would ask for a resend after it has reached its receive polls given by the -rp option. This resend logic could cause the publisher to re-scan the transaction log(s) and slows down replicating new transactions, especially on heavy load databases.

Now, when a message or messages in a multi-part message series (SQL Remote will generate multiple messages for a single transaction to form a multi-part message when the transaction is too big to fit in a single message) are missing, SQL Remote will not immediately ask for a resend, if the received messages are not followed by any messages that contain a commit or any messages that belong to another multi-part message series. This new logic will help users who need to shut down or kill SQL Remote when it is sending multi-part messages to its subscribers.

VolkerBarth
Contributor
0 Kudos

FWIW, the CR number Reg refers to is 655157.

VolkerBarth
Contributor
0 Kudos

Anything new to this topic?

Should I plan to build a workaround (by adapting DBTools to use the "-w 0" setting), or is a fix expected to come?

Former Member
0 Kudos

This problem (CR 672208) should have been fixed in the following versions, 11.0.1 (EBF 2615), 12.0.0 (EBF 2691), 12.0.1 (EBF 3365). Please ask for a newer EBF.

VolkerBarth
Contributor
0 Kudos

@Yufui: I'm aware of that - Reg has posted the CR number and more in his answer and comments below.

And there I do have asked for an additional fix in 10.0.1 - but surely I can ask again here:)