Python 2.7網頁抓取 - 列表索引超出範圍

python和網頁抓取的新手段。我試圖抓取http://www.basketball-reference.com/awards/all_league.html進行一些分析，只得到了目前爲止。使用下面的代碼，我可以在指定年份的時候只擦除3行並獲取「列表索引超出範圍」的錯誤。任何幫助/提示表示讚賞。Python 2.7網頁抓取 - 列表索引超出範圍

r = requests.get('http://www.basketball-reference.com/awards/all_league.html') 
soup=BeautifulSoup(r.text.replace('&nbsp;','').replace('&gt;','').encode('ascii','ignore'),"html.parser") 
all_league_data = pd.DataFrame(columns = ['year','team','player']) 


stw_list = soup.findAll('div', attrs={'class': 'stw'}) # Find all 'stw's' 
for stw in stw_list: 
    table = stw.find('table', attrs = {'class':'no_highlight stats_table'}) 
    for row in table.findAll('tr'): 
     col = row.findAll('td') 
     year = col[0].find(text=True) 
     print year

來源

2016-04-07 Mahesh Shankar

某些行沒有td，因此您嘗試獲取空列表的元素0。

做：

r = requests.get('http://www.basketball-reference.com/awards/all_league.html') 
soup=BeautifulSoup(r.text.replace('&nbsp;','').replace('&gt;','').encode('ascii','ignore'),"html.parser") 
all_league_data = pd.DataFrame(columns = ['year','team','player']) 

stw_list = soup.findAll('div', attrs={'class': 'stw'}) # Find all 'stw's' 
for stw in stw_list: 
    table = stw.find('table', attrs = {'class':'no_highlight stats_table'}) 
    for row in table.findAll('tr'): 
     col = row.findAll('td') 
     if col: 
      year = col[0].find(text=True) 
      print year

來源

2016-04-07 22:46:11

這是因爲灰線是一個tr和是空的。做一個檢查，如果山坳

col = row.findAll('td') 
    if col: 
     year = col[0].find(text=True) 
     print year

，並給出正確的結果

來源

2016-04-07 22:48:01 SotirisTsartsaris

Python 2.7網頁抓取 - 列表索引超出範圍

回答

相關問題