使用Python的網頁刮取數據

我剛開始學習使用Python的網頁抓取。我的目標是網站從http://money.rediff.com/companies/Bajaj-Auto-Ltd/10540026報廢Bajaj汽車有限公司的實時新聞。使用Python的網頁刮取數據

問題：我無法提取內容（即新聞）。

from urllib.request import urlopen 
from bs4 import BeautifulSoup 

url = 'http://money.rediff.com/companies/Bajaj-Auto-Ltd/10540026' 
data = urlopen(url) 
soup = BeautifulSoup(data) 

te=soup.find('a',attrs={'target':'_jbpinter'}) 
lis=te.find_all_next('a',attrs={'target':'_jbpinter'}) 
#print(lis) 

for li in lis: 
    print(li.find('a').contents[0])

我米得到的錯誤「AttributeError的：‘NoneType’對象有沒有屬性‘內容’」我沒有得到期望的結果。

任何輸入將不勝感激。

來源

2015-11-04 Nks

貌似找不到你的想法是存在的。嘗試打印'li'，看看裏面是否真的有'a' –

您正試圖讓a標記兩次。

更換

for li in lis: 
    print(li.find('a').contents[0])

與

for li in lis: 
    print(li.get_text())

，你會得到這樣的輸出：

Need Different Rates For Different Products: Rahul Bajaj on GST 
Reforms irrespective of Bihar results: Bajaj 
Auto shares in focus; Tata Motors up over 5% 
We believe new Avenger will stimulate the market: Bajaj Auto's Eric Vas 
BHP Billiton pins future of Indonesian coal mine on new...

來源

2015-11-04 16:52:11 dstudeba

使用Python的網頁刮取數據

回答

相關問題