BeautifulSoup：獲取特定表格的內容

My local airport可恥地阻止沒有IE的用戶，並且看起來很糟糕。我想編寫一個Python腳本，每隔幾分鐘就會得到Arrival and Departures頁面的內容，並以更易讀的方式顯示它們。BeautifulSoup：獲取特定表格的內容

我選擇的工具是mechanize作弊網站相信我用IE和BeautifulSoup解析頁面來獲取航班數據表。我很迷惑於BeautifulSoup文檔，無法理解如何從整個文檔中獲取表（我知道其標題），以及如何從該表中獲取行的列表。

任何想法？

來源

2010-05-29 Adam Matan

這不是您需要的具體代碼，只是演示如何使用BeautifulSoup。它發現表的id是「Table1」，並獲得它的所有tr元素。

html = urllib2.urlopen(url).read() 
bs = BeautifulSoup(html) 
table = bs.find(lambda tag: tag.name=='table' and tag.has_attr('id') and tag['id']=="Table1") 
rows = table.findAll(lambda tag: tag.name=='tr')

來源

2010-05-29 16:05:25

這真的很酷，我不知道你可以通過lambdas找到。 – goggin13 2010-05-29 16:09:37

確實很棒！檢查你的Facebook郵箱，我給你發了一條消息。 – 2010-05-29 16:28:13

任何想法當沒有id或標題來區分時，如何去特定的表...例如..我想在HTML文件中的第三個表...（沒有其他指標）。 – ihightower 2012-06-08 12:11:28

soup = BeautifulSoup(HTML) 

# the first argument to find tells it what tag to search for 
# the second you can pass a dict of attr->value pairs to filter 
# results that match the first tag 
table = soup.find("table", {"title":"TheTitle"}) 

rows=list() 
for row in table.findAll("tr"): 
    rows.append(row) 

# now rows contains each tr in the table (as a BeautifulSoup object) 
# and you can search them to pull out the times

來源

2010-05-29 16:05:11 goggin13

任何想法如何去特定的表，當沒有id或標題來區分...例如..我想在HTML文件中的第三個表...（沒有其他指標）。 – ihightower 2012-06-08 12:11:03

@ihightower：'soup.find（'table'）[2]'會爲你帶來第三個'表'。（爲了安全起見，你應該先檢查一下長度，但爲了安全起見） – hamstu 2013-09-13 17:27:12

-14

只要你在意，BeautifulSoup不再維護，原始維護者建議過渡到lxml。 Xpath應該很好地執行這個技巧。

來源

2010-05-29 23:38:01 user338971

謝謝，這是一個非常有用的信息。我會檢查lxml。 – 2010-05-30 07:54:20

這不再是事實。 BeautifulSoup 4是當前的版本，比這個答案年輕兩歲以上。 – 2013-02-27 06:10:49

我現在正在使用BeautifulSoup，所以它確實存在並且功能齊全。 – 2014-02-03 21:09:19

BeautifulSoup：獲取特定表格的內容

回答

相關問題