重新嘗試在python中用urllib打開url超時

我正在尋找使用Python（> 10k）從大量網頁解析數據，並且我發現我寫入的函數經常遇到超時每500個循環出錯。我試圖用try - 除了代碼塊來解決這個問題，但是我想改進這個函數，所以它會在返回錯誤之前重新嘗試打開url四次或五次。有沒有一個優雅的方式來做到這一點？重新嘗試在python中用urllib打開url超時

我下面的代碼：

def url_open(url): 
    from urllib.request import Request, urlopen 
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'}) 
    try: 
     s = urlopen(req,timeout=50).read() 
    except urllib.request.HTTPError as e: 
     if e.code == 404: 
      print(str(e)) 
     else: 
      print(str(e)) 
      s=urlopen(req,timeout=50).read() 
      raise 
    return BeautifulSoup(s, "lxml")

來源

2017-01-15 user3725021

可能重複[如何重試urllib2.request失敗時？]（http://stackoverflow.com/questions/9446387/how-to-retry-urllib2-request-when-fails） – phss

我已經在過去使用這樣一種模式，重試：

def url_open(url): 
    from urllib.request import Request, urlopen 
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'}) 
    retrycount = 0 
    s = None 
    while s is None: 
     try: 
      s = urlopen(req,timeout=50).read() 
     except urllib.request.HTTPError as e: 
      print(str(e)) 
      if canRetry(e.code): 
       retrycount+=1 
       if retrycount > 5: 
        raise 
       # thread.sleep for a bit 
      else: 
       raise 

    return BeautifulSoup(s, "lxml")

你只需要定義canRetry別的地方。

來源

2017-01-15 08:15:44 GantTheWanderer

重新嘗試在python中用urllib打開url超時

回答

相關問題