2016-06-13 140 views
0

我們最近安裝了一些WSUS更新+ SQL 2012 SP3(是的,所有測試都沒有UAT問題:),因爲似乎AO和羣集幾乎沒有問題 - 似乎是集羣的租約是超時的,我無法弄清楚爲什麼......;這會導致短暫的丟失和連接丟失。AlwaysOn - 羣集租約超時和PREEMPTIVE_HADR_LEASE_MECHANISM

任何幫助,將不勝感激!

AlwaysOn的擴展事件:

availability_group_lease_expired; state: LeaseEpxired; Timestamp: 2016-06-12 04:58:40.34 
availability_replica_state_change: current state: Resolving_Normal; previous_sate: Primary_Normal;Timestamp: 2016-06-12 04:58:40.34 
.. 
availability_replica_state_change: current state: Primary_Normal; previous_sate: Primary_Pending;Timestamp: 2016-06-12 04:58:52.96 

SQL日誌:

Date: 12/06/2016 04:58:40; Error: 19421, Severity: 16, State: 1. 
SQL Server hosting availability group did not receive a process event signal from the Windows Server Failover Cluster within the lease timeout period. 

Date: 12/06/2016 04:58:40; Error: 19407, Severity: 16, State: 1. 
The lease between availability group and the Windows Server Failover Cluster has expired. A connectivity issue occurred between the instance of SQL Server and the Windows Server Failover Cluster. To determine whether the availability group is failing over correctly, check the corresponding availability group resource in the Windows Server Failover Cluster. 

Date: 12/06/2016 04:58:40 
AlwaysOn: The local replica of availability group is going offline because either the lease expired or lease renewal failed. This is an informational message only. No user action is required. 

羣集日誌(不要問我爲什麼它-1H,日期的所有節點上是好的):

2016/06/12-03:58:40.587 INFO [RCM] rcm::RcmApi::FailResource: (AlwaysOn) 
2016/06/12-03:58:40.588 INFO [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'AlwaysOn', gen(3) result 0/0. 
2016/06/12-03:58:40.588 INFO [RCM] Res AlwaysOn: Online -> ProcessingFailure(StateUnknown) 
2016/06/12-03:58:40.588 INFO [RCM] TransitionToState(AlwaysOn) Online-->ProcessingFailure. 
2016/06/12-03:58:40.588 INFO [RCM] rcm::RcmGroup::UpdateStateIfChanged: (AlwaysOn, Online --> Pending) 
2016/06/12-03:58:40.588 ERR [RCM] rcm::RcmResource::HandleFailure: (AlwaysOn) 
2016/06/12-03:58:40.588 INFO [RCM] resource AlwaysOn: failure count: 1, restartAction: 2 persistentState: 1. 
2016/06/12-03:58:40.588 INFO [RCM] numDependents is zero, auto-returning true 
2016/06/12-03:58:40.588 INFO [RCM] Greater than restartPeriod time has elapsed since first failure of AlwaysOn, resetting failureTime and failureCount. 
2016/06/12-03:58:40.588 INFO [RCM] Will queue immediate restart (500 milliseconds) of AlwaysOn after terminate is complete. 
2016/06/12-03:58:40.588 INFO [RCM] Res AlwaysOn: ProcessingFailure -> WaitingToTerminate(DelayRestartingResource) 
2016/06/12-03:58:40.588 INFO [RCM] TransitionToState(AlwaysOn) ProcessingFailure-->[WaitingToTerminate to DelayRestartingResource]. 
2016/06/12-03:58:40.588 INFO [RCM] Res AlwaysOn: [WaitingToTerminate to DelayRestartingResource] -> Terminating(DelayRestartingResource) 
2016/06/12-03:58:40.588 INFO [RCM] TransitionToState(AlwaysOn) [WaitingToTerminate to DelayRestartingResource]-->[Terminating to DelayRestartingResource]. 
2016/06/12-03:58:40.588 ERR [RES] SQL Server Availability Group <AlwaysOn>: [hadrag] Lease Thread terminated 
2016/06/12-03:58:40.588 ERR [RES] SQL Server Availability Group <AlwaysOn>: [hadrag] The lease is expired. The lease should have been renewed by 2016/06/12-03:58:30.348 
2016/06/12-03:58:40.588 INFO [RES] SQL Server Availability Group: [hadrag] Stopping Health Worker Thread 
2016/06/12-03:58:40.588 INFO [RES] SQL Server Availability Group: [hadrag] Health worker was asked to terminate 

有點奇怪 - 來自最近12小時的SQL等待時間:

wait type      Wait Time  % of Total Wait 
PREEMPTIVE_HADR_LEASE_MECHANISM 80,183,360 ms 39.09% 
PREEMPTIVE_SP_SERVER_DIAGNOSTICS 80,183,265 ms 39.09% 
HADR_CLUSAPI_CALL    40,534,655 ms 19.76% 

Dodgy更新某處?讓我知道你是否有任何提示。

由於提前, 托馬斯

回答

0

1)嘗試重新啓動服務器。

2)如果服務器無響應或CPU利用率達到100%,您可以看到這些奇怪的錯誤。