2013-12-11 67 views
3

我試圖從表(第二個表)「的總決賽比賽名單」從 http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals報廢年度&獲獎名單(第一&第二列):我使用下面的代碼:如何從python美麗的湯中從表中獲得tbody?

import urllib2 
from BeautifulSoup import BeautifulSoup 

url = "http://www.samhsa.gov/data/NSDUH/2k10State/NSDUHsae2010/NSDUHsaeAppC2010.htm" 
soup = BeautifulSoup(urllib2.urlopen(url).read()) 
soup.findAll('table')[0].tbody.findAll('tr') 
for row in soup.findAll('table')[0].tbody.findAll('tr'): 
    first_column = row.findAll('th')[0].contents 
    third_column = row.findAll('td')[2].contents 
    print first_column, third_column 

隨着上面的代碼,我能夠得到第一個&第三列就好了。但是當我使用與http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals相同的代碼時,它找不到tbody作爲它的元素,但是當我檢查元素時我可以看到tbody。

url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals" 
soup = BeautifulSoup(urllib2.urlopen(url).read()) 

print soup.findAll('table')[2] 

    soup.findAll('table')[2].tbody.findAll('tr') 
    for row in soup.findAll('table')[0].tbody.findAll('tr'): 
     first_column = row.findAll('th')[0].contents 
     third_column = row.findAll('td')[2].contents 
     print first_column, third_column 

這是我從評論的錯誤了:

' 
--------------------------------------------------------------------------- 
AttributeError       Traceback (most recent call last) 
<ipython-input-150-fedd08c6da16> in <module>() 
     7 # print soup.findAll('table')[2] 
     8 
----> 9 soup.findAll('table')[2].tbody.findAll('tr') 
    10 for row in soup.findAll('table')[0].tbody.findAll('tr'): 
    11  first_column = row.findAll('th')[0].contents 

AttributeError: 'NoneType' object has no attribute 'findAll' 

' 

回答

4

如果通過檢查工具在瀏覽器,它會插入tbody標籤檢查。

源代碼可能包含或不包含它們。如果您真的想知道,我建議您查看源代碼視圖。

無論哪種方式,你並不需要遍歷到TBODY,簡單地說:

soup.findAll('table')[0].findAll('tr')應該工作。

0
url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals" 
soup = BeautifulSoup(urllib2.urlopen(url).read()) 
for tr in soup.findAll('table')[2].findAll('tr'): 
    #get data 

,然後搜索你的表需要:)