BeautifulSoup HTML提取表格數據;循環<tr><th>

這是我需要從中提取數據的HTML表格的一個示例。該表是但卻難免重複使用< TR>，<日>和< TD>BeautifulSoup HTML提取表格數據;循環<tr><th>

<table class="tablename"> 
<tr> 
    <th> Animal </th> 
    <td> Dog </td> 
</tr> 
<tr> 
    <th> Fish </th> 
    <td> Salmon </td> 
</tr> 
<tr> 
    <th> Colour </th> 
    <td> Red </td> 
</tr> 
</table>

我的代碼是這樣的：

soup = bs4.BeautifulSoup(readHtml, 'html.parser') 
tableClassResults = soup.find("table", { "class" : "tablename" }) 

tr = tableClassResults.find('tr') 
th = tr.find('th') 
print "th = ", th 
td = tr.find('td') 
print "td = ", td

這是第一個< TR>做工精細，給th =動物和td =狗。我的問題是，我想遍歷所有的< tr>，並提取< td>和相應的< td>。我發現了一些類似的問題，但我無法弄清楚如何執行findNext和loop部分。

來源

2016-01-20 gobrandal

使用find_all獲取所有匹配的元素。

然後在方法的返回值迭代來迭代th，td以下tr元素：

for tr in tableClassResults.find_all('tr'): 
    th = tr.find('th') 
    print "th = ", th 
    td = tr.find('td') 
    print "td = ", td

輸出對於給定的HTML：

th = <th> Animal </th> 
td = <td> Dog </td> 
th = <th> Fish </th> 
td = <td> Salmon </td> 
th = <th> Colour </th> 
td = <td> Red </td>

來源

2016-01-20 14:09:44 falsetru

BeautifulSoup HTML提取表格數據;循環<tr><th>​​

回答

相關問題

BeautifulSoup HTML提取表格數據;循環<tr><th>