2014-07-15 85 views
8

我想讓我的腳溼與BS。 我試圖通過文檔工作,但我遇到的第一步已經是一個問題。BeautifulSoup響應錯誤

這是我的代碼:

from bs4 import BeautifulSoup 
soup = BeautifulSoup('https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5....1b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description') 

print(soup.prettify()) 

這是響應我得到:

Warning (from warnings module): 
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/bs4/__init__.py", line 189 
'"%s" looks like a URL. Beautiful Soup is not an HTTP client. You should probably use an  
HTTP client to get the document behind the URL, and feed that document to Beautiful Soup.' % markup) 
UserWarning: "https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5...b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description" 
looks like a URL. Beautiful Soup is not an HTTP client. You should 
probably use an HTTP client to get the document behind the URL, and feed that document  
to Beautiful Soup. 
https://api.flickr.com/services/rest/?method=flickr.photos.search&api;_key=5...b&per;_page=250&accuracy;=1&has;_geo=1&extras;=geo,tags,views,description 

是不是因爲我嘗試訪問http計劃** S **還是其他問題? 感謝您的幫助!

+0

保存網頁本地然後在該文件中使用湯。 – suspectus

回答

10

您正在將URL作爲字符串傳遞。相反,你需要通過urllib2requests獲得頁面的源代碼:

from urllib2 import urlopen # for Python 3: from urllib.request import urlopen 
from bs4 import BeautifulSoup 

URL = 'https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5....1b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description' 
soup = BeautifulSoup(urlopen(URL)) 

注意,你不需要調用read()上的urlopen()結果,BeautifulSoup允許第一個參數是一個類文件對象,urlopen()返回一個類文件對象。

2

錯誤說明了一切,你傳遞一個URL到美麗的湯。您需要先獲取網站內容,然後才能將內容傳遞給BS。

要下載的內容,您可以使用urlib2

import urllib2 
response = urllib2.urlopen('http://www.example.com/') 
html = response.read() 

後來

soup = BeautifulSoup(html) 
+0

嘿,我試圖弄清楚哪個答案是第一個發佈的,它表示兩個答案都是「29分鐘前回答」。所以我認爲我喜歡一個,接受另一個。我不知道如何正確。我想接受第一個答案。 – Stophface