2014-04-18 84 views
0

起前晚凌晨2:00 - 約8小時後有人觸及其與該網站做任何事情 - 我們的Azure的網站開始拋出這個錯誤:「臨時故障」在Azure緩存服務

錯誤:錯誤代碼: SubStatus:暫時失敗。請稍後重試。 (一個或多個指定的緩存服務器不可用,這可能是由繁忙的網絡或服務器引起的)對於本地緩存羣集,還要驗證以下條件:確保已爲此客戶端帳戶授予安全權限,並檢查AppFabric緩存服務允許通過所有緩存主機上的防火牆,並且服務器上的MaxBufferSize必須大於或等於從客戶端發送的序列化對象大小)。附加信息:客戶端試圖與服務器進行通信:net.tcp://payboardprod.cache.windows.net:22233。 (在Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ErrStatus errStatus,的Guid trackingId,異常responseException,字節[] []有效載荷的EndpointId目的地)

基本上,看起來好像我們的Azure的緩存服務器採取了跳水,但我們的Azure管理控制檯上沒有任何跡象表明這個緩存服務器已經啓動並運行得很好,也沒有任何跡象表明Azure服務可用性控制面板上存在問題(唯一的跡象是任何排序的問題是,我們的Azure的緩存服務開始上報凌晨1:00左右零個請求。

Azure cache graph

我們的測試版網站使用不同的緩存服務器,但配置完全相同,因此整個劇集都保留了下來。

我們只有一個BizSpark帳戶,因此無法使用MS打開支持服務單。

我們通過禁用外部緩存來恢復服務,但這顯然不是最佳選擇。

對此進行故障排除的任何建議?

回答

1

將您的調用代碼封裝在適當的保護範圍內(try/catch),然後應對應用層失敗。任何雲中提供的商品平臺都可以(並且確實)不時遇到這類問題。您需要進行日誌記錄和日誌記錄,例如Azure診斷(http://msdn.microsoft.com/en-us/library/gg433048.aspx),以便以後進行故障排除。

0

我還沒有弄清楚問題出在哪裏,並且最終以Simon W關於包裝所有東西的建議結束了嘗試/追趕wazoo。但是因爲它不是100%直觀的,並且花了我幾次嘗試獲取緩存檢索代碼的權利,我以爲我會將它發佈到任何其他感興趣的人。

public TValue Get(string key, Func<TValue> missingFunc) 
{ 
    // We need to ensure that two processes don't try to calculate the same value at the same time. That just wastes resources. 
    // So we pull out a value from the _cacheLocks dictionary, and lock on that before trying to retrieve the object. 
    // This does add a bit more locking, and hence the chance for one process to lock up everything else. 
    // We may need to add some timeouts here at some point in time. It also doesn't prevent two processes on different 
    // machines from trying the same bit o' nonsense. Oh well. It's probably still a worthwhile optimization. 
    key = _keyPrefix + "." + key; 
    var value = default(TValue); 
    object cacheLock; 
    lock (_cacheLocks) 
    { 
     if (!_cacheLocks.TryGetValue(key, out cacheLock)) 
     { 
      cacheLock = new object(); 
      _cacheLocks[key] = cacheLock; 
     } 
    } 
    lock (cacheLock) 
    { 
     // Try to get the value from the cache. 
     try 
     { 
      value = _cache.Get(key) as TValue; 
     } 
     catch (SerializationException ex) 
     { 
      // This can happen when the app restarts, and we discover that the dynamic entity names have changed, and the desired type 
      // is no longer around, e.g., "Organization_6BA9E1E1184D9B7BDCC50D94471D7A730423456A15BBAFB6A2C6AC0FF94C0D41" 
      // If that's the error, we should probably warn about it, but no point in logging it as an error, since it's more-or-less expected. 
      _logger.Warn("Error retrieving item '" + key + "' from Azure cache; falling back to missingFunc(). Error = " + ex); 
     } 
     catch (Exception ex) 
     { 
      _logger.Error("Error retrieving item '" + key + "' from Azure cache; falling back to missingFunc(). Error = " + ex); 
     } 

     // If we didn't get anything interesting, then call the function that should be able to retrieve it for us. 
     if (value == default(TValue)) 
     { 
      // If that function throws an exception, don't swallow it. 
      value = missingFunc(); 

      // If we try to put it into the cache, and *that* throws an exception, 
      // log it, and then swallow it. 
      try 
      { 
       _cache.Put(key, value); 
      } 
      catch (Exception ex) 
      { 
       _logger.Error("Error putting item '" + key + "' into Azure cache. Error = " + ex); 
      } 
     } 
    } 
    return value; 
} 

您可以使用它像這樣:

var user = UserCache.Get(email,() => 
    _db.Users 
     .FirstOrDefault(u => u.Email == email) 
     .ShallowClone()); 
+0

這是偉大的! Microsoft的Patterns&Practices團隊還編寫了「瞬態故障處理塊」以幫助解決這種情況。如果您需要重新訪問您的緩存代碼,可能值得調查。 http://msdn.microsoft.com/en-us/library/hh680905(v=pandp.50).aspx –