通過Beautifulsoup選擇一個沒有標籤的桌子

BeautifulSoup可以選擇沒有標籤表嗎？ HTML中有許多表格，但我想要的數據是在沒有任何標籤的表格中。通過Beautifulsoup選擇一個沒有標籤的桌子

以下是我的示例： HTML中有2個表格。一個是英文，另一個是數字。

from bs4 import BeautifulSoup 

HTML2 = """ 
<table> 
    <tr> 
     <td class>a</td> 
     <td class>b</td> 
     <td class>c</td> 
     <td class>d</td> 
    </tr> 
    <tr> 
     <td class>e</td> 
     <td class>f</td> 
     <td class>g</td> 
     <td class>h</td> 
    </tr> 
</table> 

<table cellpadding="0"> 
    <tr> 
     <td class>111</td> 
     <td class>222</td> 
     <td class>333</td> 
     <td class>444</td> 
    </tr> 
    <tr> 
     <td class>555</td> 
     <td class>666</td> 
     <td class>777</td> 
     <td class>888</td> 
    </tr> 
""" 
soup2 = BeautifulSoup(HTML2, 'html.parser') 
f2 = soup2.select('table[cellpadding!="0"]') #<---I think the key point is here. 
for div in f2: 
    row = '' 
    rows = div.findAll('tr') 
    for row in rows: 
     if(row.text.find('td') != False): 
      print(row.text)

我只希望在「英語」表數據並進行格式類似如下：

a b c d 
e f g h

然後保存到Excel。

但我只能訪問那個「數字」表。有沒有提示？謝謝！

來源

2017-07-23 okeyla

您可以使用has_attr方法來測試是否表包含CELLPADDING屬性：

soup2 = BeautifulSoup(HTML2, 'html.parser') f2 = soup2.find_all('table') for div in f2: if not div.has_attr('cellpadding'): row = '' rows = div.findAll('tr') for row in rows: if(row.text.find('td') != False): print(row.text)

來源

2017-07-23 07:24:56 htn

你可以使用find_all並僅選擇不具有特定屬性表。

f2 = soup2.find_all('table', {'cellpadding':None})

或者，如果你想選擇絕對沒有屬性表：

f2 = [tbl for tbl in soup2.find_all('table') if not tbl.attrs]

然後你就可以列一個清單，從 f2並把它傳遞到數據幀。

data = [ 
    [td.text for td in tr.find_all('td')] 
    for table in f2 for tr in table.find_all('tr') 
]

來源

2017-07-23 07:25:39

通過Beautifulsoup選擇一個沒有標籤的桌子

回答

相關問題