406錯誤與機械化

試圖打開一個URL時，我得到一個406錯誤與機械化：406錯誤與機械化

for url in urls: 
    if "http://" not in url: 
     url = "http://" + url 
    print url 
    try: 
     page = mech.open("%s" % url) 
    except urllib2.HTTPError, e: 
     print "there was an error opening the URL, logging it" 
     print e.code 
     logfile = open ("log/urlopenlog.txt", "a") 
     logfile.write(url + "," + "couldn't open this page" + "\n") 
     continue 
    else: 
     print "opening this URL..." 
     page = mech.open(url)

任何想法會導致出現406錯誤？如果我轉到有問題的網址，我可以在瀏覽器中打開它。

來源

2012-12-22 user1328021

沒有必要使用插值：'page = mech.open（url）'會做得很好（雖然不是解決你的問題）。 –

406錯誤是非常特定於Web服務器。它意味着*無論如何服務器都不喜歡你的Accept頭。 –

[406意味着服務器不喜歡你的頭文件]（http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html）你可以發佈機械化發送的頭文件嗎？ –

嘗試根據瀏覽器發送的內容向請求添加標題;從添加Accept標題開始（406通常意味着服務器不喜歡你想要接受的內容）。

參見"Adding headers"文檔中：

req = mechanize.Request(url) 
req.add_header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8') 
page = mechanize.urlopen(req)

的Accept頭值有基於由鉻所發送的報頭。

來源

2012-12-22 21:53:43

嗯......似乎沒有做到這一點。仍然得到相同的錯誤。 – user1328021

@ user1328021：這完全取決於服務器，沒有簡單的答案。在訪問相同的URL之前添加您發現瀏覽器發送的標題，直到其可用。 –

我的瀏覽器顯示它正在發送上述確切的標題。接受語言或接受編碼怎麼樣？那些會有效果嗎？ – user1328021

如果你想找出哪些郵件頭。您的瀏覽器發送，此網頁並將其顯示出來：https://www.whatismybrowser.com/detect/what-http-headers-is-my-browser-sending

「接受」和「用戶代理」頭應該夠了。這是我做了什麼，擺脫錯誤的：

#establish counter 
j = 0 

#Create headers for webpage 
headers = {'User-Agent': 'Mozilla/5.0', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'} 

#Create for loop to get through list of URLs 
for url in URLs: 

    #Verify scraper agent so that web security systems don't block webpage scraping upon URL opening, with j as a counter 
    req = mechanize.Request(URLs[j], headers = headers) 

    #Open the url 
    page = mechanize.urlopen(req) 

    #increase counter 
    j += 1

您也可以嘗試導入「的urllib2」或「urllib的」庫中打開這些網址。語法是一樣的。

來源

2015-10-26 00:43:03 CopyLeft

406錯誤與機械化

回答

相關問題