機械化瀏覽器超時不工作

我遇到mechanize的timeout功能問題。在大多數網頁上，它可以很好地工作，如果URL在合理的時間內無法加載，則會產生一個錯誤：urllib2.URLError: <urlopen error timed out>。但是，在某些頁面上，定時器不起作用，即使對於鍵盤中斷，程序也不響應。這裏是一個示例頁面，發生這種情況：機械化瀏覽器超時不工作

import mechanize 

url = 'https://web.archive.org/web/20141104183547/http://www.dallasnews.com/' 

br = mechanize.Browser() 
br.set_handle_robots(False) 
br.addheaders = [('User-agent', 'Firefox')] 
html = br.open(url, timeout=0.01).read() #hangs on this page, timeout set extremely low to trigger timeout on all pages for debugging

首先，這個腳本是否爲這個特定的URL的其他人掛起？其次，可能會出現什麼問題/我該如何調試？

來源

2014-11-04 Michael

-2

我不知道爲什麼該URL請求掛斷機械化，但使用urllib2;請求恢復正常。也許他們有一些代碼可以識別機械化，儘管將機器人設置爲false。

我覺得urllib2的應該是適合自己情況很好的解決

import mechanize 
import urllib2 
url = 'https://web.archive.org/web/20141104183547/http://www.dallasnews.com/' 

try: 
    br = mechanize.Browser() 
    br.set_handle_robots(False) 
    br.addheaders = [('User-Agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] 
    html = br.open(url).read() #set_handle_robots 
except: 
    req = urllib2.Request(url, headers={'User-Agent' : 'Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_0 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7A341 Safari/528.16'}) 
    con = urllib2.urlopen(req) 
    html = con.read() 
print html

來源

2014-11-14 21:52:44 Dap

機械化工作正常，在網站的其餘部分。問題不在於獲得結果，而是在達到超時時未能終止。 – Michael 2014-11-14 21:54:59

http://stackoverflow.com/questions/3552928/how-do-i-set-a-timeout-value-for-pythons-mechanize。要正確實現超時，我會檢查這個鏈接 – Dap 2014-11-14 22:15:29

機械化瀏覽器超時不工作

回答

相關問題