2017-03-09 48 views
-1

我想在Python用BeautifulSoup報廢https://www.crowdcube.com/investments?sector=technology 3.我不能用刮美麗的湯網頁

Traceback (most recent call last): 

     File "D:\DataVisualization\lib\urllib\request.py", line 163, in urlopen 
     return opener.open(url, data, timeout) 
     File "D:\DataVisualization\lib\urllib\request.py", line 472, in open 
     response = meth(req, response) 
     File "D:\DataVisualization\lib\urllib\request.py", line 582, in http_response 
     'http', request, response, code, msg, hdrs) 
     File "D:\DataVisualization\lib\urllib\request.py", line 510, in error 
     return self._call_chain(*args) 
     File "D:\DataVisualization\lib\urllib\request.py", line 444, in _call_chain 
     result = func(*args) 
     File "D:\DataVisualization\lib\urllib\request.py", line 590, in http_error_default 
     raise HTTPError(req.full_url, code, msg, hdrs, fp) 
    urllib.error.HTTPError: HTTP Error 403: Forbidden 
+1

你能發佈你正在使用的美麗湯代碼? – bejado

+0

從BS4進口BeautifulSoup 進口的urllib,重新 數據= { '標題':[], '描述':[] } 升=( 'https://www.crowdcube.com/investment' ) 樹= BeautifulSoup(1, 'LXML') #title 標題= tree.find_all( 'DIV',{ 'CC-cardOpportunity__body'}) 數據[ '標題'] = tree.find( 'H1' ) #description description = tree.find_all('div',{'class':'cc-cardOpportunity__body'}) data ['description']。append(description [1] .find(' p')。get_text() data – Mart

+0

我不能scrapy這個網站:( – Mart

回答

-1

使用請求與本網站不需要UA:

In [23]: import requests 

In [24]: r = requests.get('https://www.crowdcube.com/investments?sector=technology') 

In [25]: r.status_code 
Out[25]: 200 
+0

OP特意要求美麗的湯。 – bejado

+0

@bejado你有什麼想法'bs4'和'urllib'或'requests'之間的區別嗎? '403'如何關注'bs4'? –

+0

我不確定OP爲什麼會得到403,但問題是特別要求_why_ 403是使用美麗的湯時發出的。你的回答沒有解決這個問題。 – bejado