使用BeautifulSoup從IMDB表中提取鏈接

-3

我正在嘗試從IMDB網頁獲取鏈接。表裏面有聯繫，但我得到這個錯誤，我不知道如何獲取鏈接我是初學者plz幫助使用BeautifulSoup從IMDB表中提取鏈接

from bs4 import BeautifulSoup 
import urllib2 

var_file = urllib2.urlopen("http://www.imdb.com/chart/top") 

var_html = var_file.read() 

var_file.close() 
soup = BeautifulSoup(var_html) 
for item in soup.find_all(tbody={'class': 'lister-list'}): 
    for link in item.find_all('a'): 
     print(link.get('href'))

我得到這個錯誤

C:\Python27\lib\site-packages\bs4\__init__.py:166: UserWarning: No parser was ex 
plicitly specified, so I'm using the best available HTML parser for this system 
("lxml"). This usually isn't a problem, but if you run this code on another syst 
em, or in a different virtual environment, it may use a different parser and beh 
ave differently. 

To get rid of this warning, change this: 

BeautifulSoup([your markup]) 

to this: 

BeautifulSoup([your markup], "lxml") 

    markup_type=markup_type))

來源

2015-11-15 Sangram Barge

只是一個側面說明：請記住，IMDB不允許刮。所以，如果你刮掉太多的數據，他們可能會禁止你的IP。 –

使用

soup.find_all(class_='lister-list')

來源

2015-11-15 05:24:49 furas

感謝您的幫助，但它說F：\> python links1.py 文件「links1.py」，第10行項目在soup.find_all（class _ ='lister-list'） ^ SyntaxError：無效語法我想語法是class ='。lister-list'但是同樣的錯誤 –

嘿感謝了很多它的工作一切都很完美我錯過了：在soup.find_all（class _ ='lister-list'）之後，因爲它在循環中 –

這裏是工作從BS4進口BeautifulSoup代碼' 進口的urllib2 var_file = urllib2.urlopen（「http://www.imdb.com/chart/top」） var_html = var_file.read（） var_file.close（）小號OUP = BeautifulSoup（var_html，「LXML」）在soup.find_all項（類_ = '利斯特清單'）： \t在item.find_all鏈接（ 'A'）：link.get \t \t打印（（ 'href'））' –

這只是一個警告，說你沒有選擇解析器...

而不是

soup = BeautifulSoup(var_html)

嘗試：

soup = BeautifulSoup(var_html, "lxml")

來源

2015-11-15 05:32:39

使用BeautifulSoup從IMDB表中提取鏈接

回答

相關問題