2017-01-19 60 views
0

我想從NFL網站刮表,但保持geting錯誤,並不知道我做錯了什麼。從網站使用beautifulsoup刮表,最後錯誤

我使用的代碼是:

import pandas 
import urllib2 

#specify the url 
NFLpage = "http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2" 

#Query the website and return the html to the variable 'page' 
page = urllib2.urlopen(NFLpage) 

#import the Beautiful soup functions to parse the data returned from the website 
from bs4 import BeautifulSoup 

#Parse the html in the 'page' variable, and store it in Beautiful Soup format 
soup = BeautifulSoup(page) 

print soup.prettify(page) 


#Find the right table 
all_tables=soup.find_all('table') 
right_table=soup.find('table', class_='tablehead') 
right_table 

for row in right_table.findAll("tr"): 

    col = row.find_all('td') 

    column_1 = col[0].string.strip() 
    RK.append(column_1) 

    column_2 = col[1].string.strip() 
    PLAYER.append(column_2) 

    column_3 = col[2].string.strip() 
    TEAM.append(column_3) 

    column_4 = col[3].string.strip() 
    GP.append(column_4) 

    column_5 = col[4].string.strip() 
    G1.append(column_5) 

    column_6 = col[5].string.strip() 
    A1.append(column_6) 

    column_7 = col[6].string.strip() 
    PTS.append(column_7) 

    column_8 = col[7].string.strip() 
    Diff.append(column_8) 

    column_9 = col[8].string.strip() 
    PIM.append(column_9) 

    column_10 = col[9].string.strip() 
    PTSG.append(column_10) 

    column_11 = col[10].string.strip() 
    SOG.append(column_11) 

    column_12 = col[11].string.strip() 
    PCT.append(column_12) 

    column_13 = col[12].string.strip() 
    GWG.append(column_13) 


    column_14 = col[13].string.strip() 
    G2.append(column_14) 

    column_15 = col[14].string.strip() 
    A2.append(column_15) 

    column_16 = col[15].string.strip() 
    G3.append(column_16) 

    column_17 = col[15].string.strip() 
    A3.append(column_17) 


columns = {'RK': RK, 'PLAYER':PLAYER, 'TEAM'=TEAM, 'GP': GP, 'G1': G1, 'A1': A1, 'PTS': PTS, 'Diff'=Diff, 'PIM'=PIM, 'PTSG'=PTSG, 'SOG'=SOG, 'PCT'=PCT, 'GWG'=GWG, 'G2'=G2, 'A2'=A2, 'G3'=G3,'A3'=A3} 

df = pd.DataFrame(columns) 

df 

在列分配行(從末3)目前得到一個錯誤。你能幫我看看我做錯了什麼嗎?

乾杯, Andreia

+0

歡迎#1。這真的是一個[最小](http://stackoverflow.com/help/mcve)的例子嗎? – loki

+0

我的第一個問題,仍然在學習如何使用它最好 –

回答

1

pandas可以從URL中讀取表,你可以參考一下Document

import pandas as pd 

pd.read_html('http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2') 

出來:

[  0      1  2 3 4 5 6 7 8  9 \ 
0 NaN      PP SH NaN NaN NaN NaN NaN NaN NaN 
1 RK     PLAYER TEAM GP G A PTS +/- PIM PTS/G 
2  1   Jamie Benn, LW DAL 82 35 52 87 1 64 1.06 
3  2   John Tavares, C NYI 82 38 48 86 5 46 1.05 
4  3  Sidney Crosby, C PIT 77 28 56 84 5 47 1.09 
5  4  Alex Ovechkin, LW WSH 81 53 28 81 10 58 1.00 
6 NaN  Jakub Voracek, RW PHI 82 22 59 81 1 78 0.99 
7  6 Nicklas Backstrom, C WSH 82 18 60 78 5 40 0.95 
8  7   Tyler Seguin, C DAL 71 37 40 77 -1 20 1.08 
9  8   Jiri Hudler, LW CGY 78 31 45 76 17 14 0.97 
10 NaN  Daniel Sedin, LW VAN 82 20 56 76 5 18 0.93 
+0

謝謝,我得到ImportError:沒有名爲lxml的模塊。似乎無法安裝它。我正在使用anaconda。 –

+0

不得不安裝lxml,然後通過conda install命令安裝html5lib,然後是,能夠看到表 –

+0

有另一個問題,需要將該表作爲一個數據幀,並猜測這不是一個數據幀 –