3

我在多線程環境中大量地抓取網頁的內容。我需要一個可以承受臨時服務器故障,連接丟失等的可靠下載器組件。下面是我的代碼的樣子。我得到WebException:「操作已超時」立即在HttpWebRequest.GetResponse()

現在,我一遍又一遍地出現了一個奇怪的情況:這一切都始於完美。 10個線程同時拉動數據約10分鐘。在那之後,我在調用我的請求對象的GetResponse方法之後,立即開始獲取WebException,超時時間爲。休息一下(讓一條線進入睡眠狀態)並沒有幫助。只有當我停止應用程序並啓動它直到下一個10分鐘過去並且問題再次出現時它纔有用。

我試了一下已經並沒有什麼幫助:

  • 關閉/處置明確響應對象,並通過「使用」的聲明
  • 到處打電話request.Abort它可以幫助
  • 在ServicePointManager/ServicePoint和WebRequest級別操作超時(延長/縮短超時間隔)
  • 操縱KeepAlive屬性
  • 調用CloseConnect ctionGroup
  • 操作同時運行的線程

沒有什麼幫助的數量!所以它看起來像是一個錯誤或者至少非常不完善的記錄行爲。我在Google和Stackoverflow上看到很多關於這個問題的問題,但其中沒有一個完全回答。基本上人們會從上面的列表中提出一件事情。我嘗試了所有這些。

public TResource DownloadResource(Uri uri) 
    { 
     for (var resourceReadingAttempt = 0; resourceReadingAttempt <= MaxTries; resourceReadingAttempt++) 
     { 
      var request = (HttpWebRequest)WebRequest.Create(uri); 
      HttpWebResponse response = null; 
      for (var downloadAttempt = 0; downloadAttempt <= MaxTries; downloadAttempt++) 
      { 
       if (downloadAttempt > 0) 
       { 
        var sleepFor = TimeSpan.FromSeconds(4 << downloadAttempt) + TimeSpan.FromMilliseconds(new Random(DateTime.Now.Millisecond).Next(1000)); 
        Trace.WriteLine("Retry #" + downloadAttempt + " in " + sleepFor + "."); 
        Thread.Sleep(sleepFor); 
       } 
       Trace.WriteLine("Trying to get a resource by URL: " + uri); 

       var watch = Stopwatch.StartNew(); 
       try 
       { 
        response = (HttpWebResponse)request.GetResponse(); 
        break; 
       } 
       catch (WebException exception) 
       { 
        request.Abort(); 
        Trace.WriteLine("Failed to get a resource by the URL: " + uri + " after " + watch.Elapsed + ". " + exception.Message); 
        if (exception.Status == WebExceptionStatus.Timeout) 
        { 
         //Trace.WriteLine("Closing " + request.ServicePoint.CurrentConnections + " current connections."); 
         //request.ServicePoint.CloseConnectionGroup(request.ConnectionGroupName); 
         //request.Abort(); 
         continue; 
        } 
        else 
        { 
         using (var failure = exception.Response as HttpWebResponse) 
         { 

          Int32 code; 
          try { code = failure != null ? (Int32)failure.StatusCode : 500; } 
          catch { code = 500; } 

          if (code >= 500 && code < 600) 
          { 
           if (failure != null) failure.Close(); 
           continue; 
          } 
          else 
          { 
           Trace.TraceError(exception.ToString()); 
           throw; 
          } 
         } 
        } 
       } 
      } 

      if (response == null) throw new ApplicationException("Unable to get a resource from URL \"" + uri + "\"."); 
      try 
      { 
       // response disposal is required to eliminate problems with timeouts 
       // more about the problem: http://stackoverflow.com/questions/5827030/httpwebrequest-times-out-on-second-call 
       // http://social.msdn.microsoft.com/Forums/en/netfxnetcom/thread/a2014f3d-122b-4cd6-a886-d619d7e3140e 

       TResource resource; 
       using (var stream = response.GetResponseStream()) 
       { 
        try 
        { 
         resource = this.reader.ReadFromStream(stream); 
        } 
        catch (IOException exception) 
        { 
         Trace.TraceError("Unable to read the resource stream: " + exception.ToString()); 
         continue; 
        } 
       } 
       return resource; 
      } 
      finally 
      { 
       // recycle as much as you can 
       if (response != null) 
       { 
        response.Close(); 
        (response as IDisposable).Dispose(); 
        response = null; 
       } 
       if (request != null) 
       { 
        //Trace.WriteLine("closing connection group: " + request.ConnectionGroupName); 
        //request.ServicePoint.CloseConnectionGroup(request.ConnectionGroupName); 
        request.Abort(); 
        request = null; 
       } 
      } 
     } 
     throw new ApplicationException("Resource was not able to be acquired after several attempts."); 
    } 
+0

類似的問題已發佈在msdn論壇http://social.msdn.microsoft.com/Forums/en-US/netfxnetcom/thread/84b59184-6cdc-41cc-bf5c-8e550a2a210a –

回答

0

我有同樣的問題,我有搜索互聯網上的很多,我得到了1個解決方案,解決線程的數量在同一時間。你必須在一個時間來控制線程的數目,我已經開始使用一次2-3個線程。 也使用這個ServicePointManager.DefaultConnectionLimit = 200; 這將真的幫助你。