頁面上的a
標籤有一個image
類,而不是div
:
>>> img_links = soup.findAll("a", {"class":"image"})
>>> for img_link in img_links:
... print img_link.img['src']
...
//upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Stora_Kronan.jpeg/100px-Stora_Kronan.jpeg
//upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Christuss%C3%A4ule_8.jpg/77px-Christuss%C3%A4ule_8.jpg
...
,或者甚至更好,使用a.image > img
CSS selector
:
>>> for img in soup.select('a.image > img'):
... print img['src']
//upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Stora_Kronan.jpeg/100px-Stora_Kronan.jpeg
//upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Christuss%C3%A4ule_8.jpg/77px-Christuss%C3%A4ule_8.jpg
...
UPD(使用urllib.urlretrieve
下載圖像):
from urllib import urlretrieve
import urlparse
from bs4 import BeautifulSoup
import urllib2
url = "http://en.wikipedia.org/wiki/Main_Page"
soup = BeautifulSoup(urllib2.urlopen(url))
for img in soup.select('a.image > img'):
img_url = urlparse.urljoin(url, img['src'])
file_name = img['src'].split('/')[-1]
urlretrieve(img_url, file_name)
由於一噸...是有可能下載圖像.... – LonelySoul
@LonelySoul肯定,檢查'UPD'部分。 – alecxe