2014-12-25 90 views
3

我想從以下wikipedia page檢索3列(NFL團隊,玩家姓名,大學團隊)。我是python的新手,一直在嘗試使用beautifulsoup來完成這個任務。我只需要屬於QB的列,但我甚至無法獲得所有列,儘管位置。這是我迄今爲止所做的,它什麼都不輸出,我不完全確定爲什麼。我相信這是由於一個標籤,但我不知道要改變什麼。任何幫助將不勝感激。'Wikipedia使用Python刮臉

wiki = "http://en.wikipedia.org/wiki/2008_NFL_draft" 
header = {'User-Agent': 'Mozilla/5.0'} #Needed to prevent 403 error on Wikipedia 
req = urllib2.Request(wiki,headers=header) 
page = urllib2.urlopen(req) 
soup = BeautifulSoup(page) 

rnd = "" 
pick = "" 
NFL = "" 
player = "" 
pos = "" 
college = "" 
conf = "" 
notes = "" 

table = soup.find("table", { "class" : "wikitable sortable" }) 

#print table 

#output = open('output.csv','w') 

for row in table.findAll("tr"): 
    cells = row.findAll("href") 
    print "---" 
    print cells.text 
    print "---" 
    #For each "tr", assign each "td" to a variable. 
    #if len(cells) > 1: 
     #NFL = cells[1].find(text=True) 
     #player = cells[2].find(text = True) 
     #pos = cells[3].find(text=True) 
     #college = cells[4].find(text=True) 
     #write_to_file = player + " " + NFL + " " + college + " " + pos 
     #print write_to_file 

    #output.write(write_to_file) 

#output.close() 

我知道它有很多評論它,因爲我試圖找到故障是在哪裏。

回答

5

這裏是我會做什麼:

  • 發現使用find_next_sibling()
  • 發現裏面
  • 的每一行所有tr標籤Player Selections
  • 獲得下一wikitable,發現tdth標籤並通過索引獲得想要的細胞

下面是代碼:

filter_position = 'QB' 
player_selections = soup.find('span', id='Player_selections').parent 
for row in player_selections.find_next_sibling('table', class_='wikitable').find_all('tr')[1:]: 
    cells = row.find_all(['td', 'th']) 

    try: 
     nfl_team, name, position, college = cells[3].text, cells[4].text, cells[5].text, cells[6].text 
    except IndexError: 
     continue 

    if position != filter_position: 
     continue 

    print nfl_team, name, position, college 

這裏是輸出(僅四分衛被過濾):

Atlanta Falcons Ryan, MattMatt Ryan† QB Boston College 
Baltimore Ravens Flacco, JoeJoe Flacco QB Delaware 
Green Bay Packers Brohm, BrianBrian Brohm QB Louisville 
Miami Dolphins Henne, ChadChad Henne QB Michigan 
New England Patriots O'Connell, KevinKevin O'Connell QB San Diego State 
Minnesota Vikings Booty, John DavidJohn David Booty QB USC 
Pittsburgh Steelers Dixon, DennisDennis Dixon QB Oregon 
Tampa Bay Buccaneers Johnson, JoshJosh Johnson QB San Diego 
New York Jets Ainge, ErikErik Ainge QB Tennessee 
Washington Redskins Brennan, ColtColt Brennan QB Hawaiʻi 
New York Giants Woodson, Andre'Andre' Woodson QB Kentucky 
Green Bay Packers Flynn, MattMatt Flynn QB LSU 
Houston Texans Brink, AlexAlex Brink QB Washington State