2015-10-30 45 views

回答

1

我的建議:使用pandas.DataFrame。它可以從許多來源加載數據,包括HTML。

您可以使用fillna方法輕鬆處理空單元格。

考慮這個例子:

import pandas as pd 

# read_excel returns list of dataframes. 
# In this case we know there is only one in the page 
df = pd.read_html('http://www.basketball-reference.com/leagues/NBA_2015_per_poss.html', 
        attrs={'id': 'per_poss'})[0] 

# the headers repeat every 20 lines, filtering them out 
df = df[df['Rk'] != 'Rk'] 

# inserting 0 to empty cells 
# could also use inplace=True kwarg instead of reassigning, or pass a 
# dictionary to use different value for each column 
df = df.fillna(0) 
+0

好方法的確! – SIslam

+0

該表不與「空」單元格一起進入,單元格不出現。例如網站上的第四行有0,0,然後是3P,3PA,3P%的空白。這會在表格中顯示爲0,0,4.5(3P%後的下一個值)。並且我得到錯誤「找不到html5lib,請安裝它」,即使我已安裝html5lib,但在運行代碼時 –