我在多線程環境中大量地抓取網頁的內容。我需要一個可以承受臨時服務器故障,連接丟失等的可靠下載器組件。下面是我的代碼的樣子。我得到WebException:「操作已超時」立即在HttpWebRequest.GetResponse()
現在,我一遍又一遍地出現了一個奇怪的情況:這一切都始於完美。 10個線程同時拉動數據約10分鐘。在那之後,我在調用我的請求對象的GetResponse方法之後,立即開始獲取WebException,超時時間爲。休息一下(讓一條線進入睡眠狀態)並沒有幫助。只有當我停止應用程序並啓動它直到下一個10分鐘過去並且問題再次出現時它纔有用。
我試了一下已經並沒有什麼幫助:
- 關閉/處置明確響應對象,並通過「使用」的聲明
- 到處打電話request.Abort它可以幫助
- 在ServicePointManager/ServicePoint和WebRequest級別操作超時(延長/縮短超時間隔)
- 操縱KeepAlive屬性
- 調用CloseConnect ctionGroup
- 操作同時運行的線程
沒有什麼幫助的數量!所以它看起來像是一個錯誤或者至少非常不完善的記錄行爲。我在Google和Stackoverflow上看到很多關於這個問題的問題,但其中沒有一個完全回答。基本上人們會從上面的列表中提出一件事情。我嘗試了所有這些。
public TResource DownloadResource(Uri uri)
{
for (var resourceReadingAttempt = 0; resourceReadingAttempt <= MaxTries; resourceReadingAttempt++)
{
var request = (HttpWebRequest)WebRequest.Create(uri);
HttpWebResponse response = null;
for (var downloadAttempt = 0; downloadAttempt <= MaxTries; downloadAttempt++)
{
if (downloadAttempt > 0)
{
var sleepFor = TimeSpan.FromSeconds(4 << downloadAttempt) + TimeSpan.FromMilliseconds(new Random(DateTime.Now.Millisecond).Next(1000));
Trace.WriteLine("Retry #" + downloadAttempt + " in " + sleepFor + ".");
Thread.Sleep(sleepFor);
}
Trace.WriteLine("Trying to get a resource by URL: " + uri);
var watch = Stopwatch.StartNew();
try
{
response = (HttpWebResponse)request.GetResponse();
break;
}
catch (WebException exception)
{
request.Abort();
Trace.WriteLine("Failed to get a resource by the URL: " + uri + " after " + watch.Elapsed + ". " + exception.Message);
if (exception.Status == WebExceptionStatus.Timeout)
{
//Trace.WriteLine("Closing " + request.ServicePoint.CurrentConnections + " current connections.");
//request.ServicePoint.CloseConnectionGroup(request.ConnectionGroupName);
//request.Abort();
continue;
}
else
{
using (var failure = exception.Response as HttpWebResponse)
{
Int32 code;
try { code = failure != null ? (Int32)failure.StatusCode : 500; }
catch { code = 500; }
if (code >= 500 && code < 600)
{
if (failure != null) failure.Close();
continue;
}
else
{
Trace.TraceError(exception.ToString());
throw;
}
}
}
}
}
if (response == null) throw new ApplicationException("Unable to get a resource from URL \"" + uri + "\".");
try
{
// response disposal is required to eliminate problems with timeouts
// more about the problem: http://stackoverflow.com/questions/5827030/httpwebrequest-times-out-on-second-call
// http://social.msdn.microsoft.com/Forums/en/netfxnetcom/thread/a2014f3d-122b-4cd6-a886-d619d7e3140e
TResource resource;
using (var stream = response.GetResponseStream())
{
try
{
resource = this.reader.ReadFromStream(stream);
}
catch (IOException exception)
{
Trace.TraceError("Unable to read the resource stream: " + exception.ToString());
continue;
}
}
return resource;
}
finally
{
// recycle as much as you can
if (response != null)
{
response.Close();
(response as IDisposable).Dispose();
response = null;
}
if (request != null)
{
//Trace.WriteLine("closing connection group: " + request.ConnectionGroupName);
//request.ServicePoint.CloseConnectionGroup(request.ConnectionGroupName);
request.Abort();
request = null;
}
}
}
throw new ApplicationException("Resource was not able to be acquired after several attempts.");
}
類似的問題已發佈在msdn論壇http://social.msdn.microsoft.com/Forums/en-US/netfxnetcom/thread/84b59184-6cdc-41cc-bf5c-8e550a2a210a –