cancel
Showing results for 
Search instead for 
Did you mean: 

SQL Server AlwaysOn Failover threshold and Lease Timeout

former_member211576
Contributor
0 Kudos

Hi experts, 

  I found each time when I were building/restoring another log shipping(standby) server. It would cause ERR [RES] SQL Server Availability Group: [hadrag] Failure detected, diagnostics heartbeat is lost(in cluster log) and A connection timeout has occurred on a previously established connection to availability replica 'DL980-4' with id(in errorlog). I google and find a document (http://download.microsoft.com/download/0/F/B/0FBFAA46-2BFD-478F-8E56-7BF3C672DF9D/Troubleshooting%20... ) indicated that “This may be a performance issue”.  I  run restore database and AlwaysOn synchronizing on the same 10GbE link at the same time.

  Should I increase

leaseTimeout from 20000 to 100000 and

HealthCheckTimeout from 30000 to 300000?

Does it work to prevent unnecessary failover.

---

Please refer to

(http://blogs.msdn.com/b/psssql/archive/2012/09/07/how-it-works-sql-server-alwayson-lease-timeout.asp... )

parag

10-18-2013 3:19 AM

#

Hi Denzil

we seem to see lease expires very frequently when the server is under very high cpu pressure .. our failure condition level is 1

is it possible to prevent this situation . the problem is when lease expires,all the current connections seem to be dropped . wondering if there is a way to prevent this ..

Also is it possible to affitinize the always on health check process to a particular core

Thanks for your help!

Accepted Solutions (0)

Answers (1)

Answers (1)

S_Sriram
Active Contributor
0 Kudos

Hi Dennis


  Should I increase

leaseTimeout from 20000 to 100000 and

HealthCheckTimeout from 30000 to 300000?

Does it work to prevent unnecessary failover.

1. you can increase the timeout parameter, but when the fail-over time slight delay will be there (its a work around solution)

2. You may require to check the Network connections. (Cluster Heartbeat & public network)

3. Have you update the latest patches of OS & DB?

4.  If possible raise the ticket to Microsoft. they may give some update for Cluster resource update based on your issue

Regards

Sriram

former_member211576
Contributor
0 Kudos

Hi SS,

  Thanks for your reply.

1. you can increase the timeout parameter, but when the fail-over time slight delay will be there (its a work around solution)

> Yes, I understand.

2. You may require to check the Network connections. (Cluster Heartbeat & public network)

> We run this system for 3 months and I find there is no such issues until I restore DB on SAP internal LAN.

3. Have you update the latest patches of OS & DB?

> I did not update OS & DB in 3 months but I did install the latest drivers, hotfixes, CU while I installed the system 3 months ago.

4.  If possible raise the ticket to Microsoft. they may give some update for Cluster resource update based on your issue

> I already did.

S_Sriram
Active Contributor
0 Kudos

Hi Dennis

Is this any Antivirus software installed on Windows cluster system?

Regards

SS

former_member211576
Contributor
0 Kudos

Hi SS,

No. No antivirus software is installed.

former_member188883
Active Contributor
0 Kudos

Hi Dennis,

Do you see any Windows Firewall being enabled on the severs ?

Regards,

Deepak Kori

former_member211576
Contributor
0 Kudos

Hi Deepak,

  No, windows firewalls are off on both nodes.

S_Sriram
Active Contributor
0 Kudos

Hi Dennis

One more thing  1. Its happening only MS Sql Server Cluster group or all cluster groups?

former_member211576
Contributor
0 Kudos

Hi SS,

  Only SQL server cluster, that's to say, Available group - the listener.

The SAP <SID> does not failover.