使用python解析HTML中的表格

我一直在試圖解析來自網頁的信息。基本上，我想從HTML中的表格中提取一些信息，這樣我就可以改變它。我堅持的部分是解析表格中的HTML。使用python解析HTML中的表格

該網頁是http://weather.unbc.ca/wx/data-table.html

我試着使用：

import urllib2 
from bs4 import BeautifulSoup 


contenturl = "http://weather.unbc.ca/wx/data-table.html" 


soup = BeautifulSoup(urllib2.urlopen(contenturl).read()) 

table = soup.find('tr', attrs={'class': 'content'}) 

rows = table.findAll('tr') 
for tr in rows: 
cols = tr.findAll('td') 
if 'cell_c' in cols[0]['class']: 
    # currency row 
     Date_time, Record, Tair, Tdew, RH, pstn, pmsl, wspd_avg, wspd_vec,  wdir, wstd, wgust, precip, solarq, solarq_un, kdown, kdown_dif, Sun, Ldown = [c.text for c in cols] 
     print Date_time, Record, Tair, Tdew, RH, pstn, pmsl, wspd_avg, wspd_vec, wdir, wstd, wgust, precip, solarq, solarq_un, kdown, kdown_dif, Sun, Ldown

我似乎得到錯誤：回溯（最近通話最後一個）：文件」 \數據。 PY」，第14行，在行= table.findAll（ 'TR'） AttributeError的： 'NoneType' 對象沒有屬性 '的findAll'

原諒我的美麗湯無知。我完全接受其他方法。我的目標是將表格中的最後一行放入變量中，以便我可以趨勢化。

來源

2016-11-02 huntthedead

無類型基本意思是soup.find返回無。

我在BeautifulSoup或的urllib不是專家，但我大膽猜測是，它找不到任何TR帶班內容。

希望它有幫助。

來源

2016-11-02 23:49:04

我想我需要更多與美麗的幫助。我可以打印'湯'，它顯示了很多tr。我會嘗試做更多的研究，如何更好地把它放在桌子上 - 我必須在那裏做錯事。 – huntthedead

嘗試檢查您嘗試從中提取數據的頁面的HTML。也許你錯過了那裏。 –

使用python解析HTML中的表格

回答

相關問題