網絡爬蟲HTTP錯誤403：禁止

我是一個嘗試編寫網絡蜘蛛腳本的新手。我想轉到一個頁面，在文本框中輸入數據，通過單擊提交按鈕轉到下一頁，並在新頁面上檢索所有數據，迭代。網絡爬蟲HTTP錯誤403：禁止

以下是代碼我想：

import urllib 
import urllib2 
import string 
import sys 
from BeautifulSoup import BeautifulSoup 

hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11','Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3','Accept-Encoding': 'none','Accept-Language': 'en-US,en;q=0.8','Connection': 'keep-alive'} 
values = {'query' : '5ed10c844ed4266a18d34e2ba06b381a' } 
data = urllib.urlencode(values) 
request = urllib2.Request("https://www.virustotal.com/#search", data, headers=hdr) 
response = urllib2.urlopen(request) 
the_page = response.read() 
pool = BeautifulSoup(the_page) 

print pool

以下是錯誤：

Traceback (most recent call last): 
File "C:\Users\Dipanshu\Desktop\webscraping_demo.py", line 19, in <module> 
response = urllib2.urlopen(request) 
File "C:\Python27\lib\urllib2.py", line 126, in urlopen 
return _opener.open(url, data, timeout) 
File "C:\Python27\lib\urllib2.py", line 406, in open 
response = meth(req, response) 
File "C:\Python27\lib\urllib2.py", line 519, in http_response 
'http', request, response, code, msg, hdrs) 
File "C:\Python27\lib\urllib2.py", line 444, in error 
return self._call_chain(*args) 
File "C:\Python27\lib\urllib2.py", line 378, in _call_chain 
result = func(*args) 
File "C:\Python27\lib\urllib2.py", line 527, in http_error_default 
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) 
HTTPError: HTTP Error 403: Forbidden

我該如何解決這個問題？

來源

2012-12-21 Dipanshu

錯誤的路徑。 'POST/search /' –

[urllib2.HTTPError：HTTP Error 403：Forbidden]可能的重複（https://stackoverflow.com/questions/13303449/urllib2-httperror-http-error-403-forbidden） – djinn

據我所知，您的request參數設置不正確，並且（可能）將您的蜘蛛帶到您不應該查看的頁面。

This user had a similar problem, but fixed it by modifying the headers。

來源

2012-12-21 09:42:14 NlightNFotis

我加了全部該帖子中指定的標題已經存在，但仍然無法使用！ – Dipanshu

@Dipanshu我不認爲你必須添加在這篇文章中指定的標題，因爲他試圖打開一個不同的網站。你必須定製現有的'request'及其參數。 – NlightNFotis

網絡爬蟲HTTP錯誤403：禁止

回答

相關問題