Python和BS4 - 經過一定時間後停止閱讀

第一次使用Python 3用戶並開始獲取它的掛起。作爲練習，我試圖從http://rateyourmusic.com/customchart中讀取表格（使用BeautifulSoup4），並將排名，藝術家，專輯和年份轉換爲字典。然後我想將字典放入MySQL數據庫中。我能夠從表格中獲取所有信息，並將它們放入變量中，然後將其放入字典中，但我遇到了一些小問題。表格中的最後一項是廣告，因此它不會跟隨其上的其他表格行。我只想讀取表格的前100行。嘗試閱讀廣告行時發生錯誤。Python和BS4 - 經過一定時間後停止閱讀

這是我的代碼。請任何幫助將是偉大的。此外，如果您在我的代碼中看到任何錯誤，或者我可以做得更好，請讓我知道。

所以它打印的字典和一切看起來不錯，但它給我一個錯誤後，打印出所有。

from bs4 import BeautifulSoup 
from urllib.request import Request, urlopen 

url = "http://rateyourmusic.com/customchart" 
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'}) 
soup = BeautifulSoup(urlopen(req)) 

table = soup.find("table", {"class" : "mbgen"}) 
totalList = [] 

for row in table.findAll("tr"): 
    cells = row.findAll("td") 
    rank = int(cells[0].find(class_="ooookiig").text) 
    artist = cells[2].find(class_="artist").text 
    album = cells[2].find(class_="album").text 
    year = cells[2].find(class_="mediumg").text 
    year = int(year[1:5]) 

    chartData = {"Rank":rank, "Artist":artist, "Album":album, "Year":year} 
    totalList.append(chartData) 
    print(chartData)

來源

2013-10-24 The Nomad

可否請您提供完整的追蹤？ – aIKid

Traceback（最近調用最後一次）：文件「C：\ Programming \ RateYourMusicCrawler \ AlbumInfoCrawler.py」，第21行，在 rank = int（cells [0] .find（class _ =「ooookiig」）。text） AttributeError：'NoneType'對象沒有屬性'text' –

你可以用一個計數器進行迭代，並儘快計數器達到100停止，但我不喜歡這樣非常多，他們決定來增加元素的數量的代碼不會有用了以200爲例。我會使用一個簡單的try塊，如下所示：

for row in table.findAll("tr"): 
    try: 
     cells = row.findAll("td") 
     rank = int(cells[0].find(class_="ooookiig").text) 
     artist = cells[2].find(class_="artist").text 
     album = cells[2].find(class_="album").text 
     year = cells[2].find(class_="mediumg").text 
     year = int(year[1:5]) 

     chartData = {"Rank":rank, "Artist":artist, "Album":album, "Year":year} 
     totalList.append(chartData) 
     print(chartData) 
    except AttributeError: 
     pass

來源

2013-10-24 22:35:09 PepperoniPizza

太棒了！非常感謝你的辣椒比薩！即刻解決我的問題。我不習慣捕捉錯誤。再次感謝！！！ –

這是因爲解析器找不到該項目。

從BS4 documentation：

If find_all() can’t find anything, it returns an empty list. If find() can’t find anything, it returns None

您可以使用try塊，但就個人而言，我更喜歡手動檢查：

for rownumber, row in enumerate(table.findAll('tr')): 
    if rownumber < 100: 
     #do something

來源

2013-10-24 22:35:17 aIKid

Python和BS4 - 經過一定時間後停止閱讀

回答

相關問題