我正在嘗試從IMDB網頁獲取鏈接。表裏面有聯繫,但我得到這個錯誤,我不知道如何獲取鏈接我是初學者plz幫助使用BeautifulSoup從IMDB表中提取鏈接
from bs4 import BeautifulSoup
import urllib2
var_file = urllib2.urlopen("http://www.imdb.com/chart/top")
var_html = var_file.read()
var_file.close()
soup = BeautifulSoup(var_html)
for item in soup.find_all(tbody={'class': 'lister-list'}):
for link in item.find_all('a'):
print(link.get('href'))
我得到這個錯誤
C:\Python27\lib\site-packages\bs4\__init__.py:166: UserWarning: No parser was ex
plicitly specified, so I'm using the best available HTML parser for this system
("lxml"). This usually isn't a problem, but if you run this code on another syst
em, or in a different virtual environment, it may use a different parser and beh
ave differently.
To get rid of this warning, change this:
BeautifulSoup([your markup])
to this:
BeautifulSoup([your markup], "lxml")
markup_type=markup_type))
只是一個側面說明:請記住,IMDB不允許刮。所以,如果你刮掉太多的數據,他們可能會禁止你的IP。 –