python和網頁抓取的新手段。我試圖抓取http://www.basketball-reference.com/awards/all_league.html進行一些分析,只得到了目前爲止。使用下面的代碼,我可以在指定年份的時候只擦除3行並獲取「列表索引超出範圍」的錯誤。任何幫助/提示表示讚賞。Python 2.7網頁抓取 - 列表索引超出範圍
r = requests.get('http://www.basketball-reference.com/awards/all_league.html')
soup=BeautifulSoup(r.text.replace(' ','').replace('>','').encode('ascii','ignore'),"html.parser")
all_league_data = pd.DataFrame(columns = ['year','team','player'])
stw_list = soup.findAll('div', attrs={'class': 'stw'}) # Find all 'stw's'
for stw in stw_list:
table = stw.find('table', attrs = {'class':'no_highlight stats_table'})
for row in table.findAll('tr'):
col = row.findAll('td')
year = col[0].find(text=True)
print year