正則表達式/ beautifulsoup如何從html表中提取列的所有值？

從這個代碼：正則表達式/ beautifulsoup如何從html表中提取列的所有值？

<tr><td>PC1</td><td>zz:zz:zz:zz:zz:ce</td><td>10.0.0.244</td><td>23 hours, 55 minutes, 25 seconds</td></tr> 
<tr><td>PC2</td><td>zz:zz:zz:zz:zz:cf</td><td>10.0.0.245</td><td>23 hours, 23 minutes, 27 seconds</td></tr>

我想獲得MAC地址的數組和另一個數組IP的

我覺得那樣的東西正則表達式的Mac電腦：<\/td><td>(.*?){17}<\/td> 但它的運行時間相匹配太。

有什麼建議嗎？

謝謝！

來源

2016-03-07 azDev

由於您已經知道mac地址位於第二列，因此使用xpath查詢使用lxml（比美麗的湯快）。你不需要正則表達式。 –

可能的重複http://stackoverflow.com/questions/13074586/extracting-selected-columns-from-a-table-using-beautifulsoup – maazza

從html你給，你可以做到以下幾點：

from bs4 import BeautifulSoup 

html = """<tr><td>PC1</td><td>zz:zz:zz:zz:zz:ce</td><td>10.0.0.244</td><td>23 hours, 55 minutes, 25 seconds</td></tr> 
<tr><td>PC2</td><td>zz:zz:zz:zz:zz:cf</td><td>10.0.0.245</td><td>23 hours, 23 minutes, 27 seconds</td></tr>""" 

soup = BeautifulSoup(html) 
mac_ips = [] 

for tr in soup.find_all('tr'): 
    cols = [td.text for td in tr.find_all('td')] 
    mac_ips.append((cols[1], cols[2])) 

for mac, ip in mac_ips: 
    print '{} {}'.format(mac, ip)

給你：

zz:zz:zz:zz:zz:ce 10.0.0.244 
zz:zz:zz:zz:zz:cf 10.0.0.245

即mac_ips將持有的每一行作爲一個匹配對：

[(u'zz:zz:zz:zz:zz:ce', u'10.0.0.244'), (u'zz:zz:zz:zz:zz:cf', u'10.0.0.245')]

如果你想單獨列出，那麼你可以做到以下幾點：

from bs4 import BeautifulSoup 

html = """<tr><td>PC1</td><td>zz:zz:zz:zz:zz:ce</td><td>10.0.0.244</td><td>23 hours, 55 minutes, 25 seconds</td></tr> 
<tr><td>PC2</td><td>zz:zz:zz:zz:zz:cf</td><td>10.0.0.245</td><td>23 hours, 23 minutes, 27 seconds</td></tr>""" 

soup = BeautifulSoup(html) 
mac = [] 
ip = [] 

for tr in soup.find_all('tr'): 
    cols = [td.text for td in tr.find_all('td')] 
    mac.append(cols[1]) 
    ip.append(cols[2]) 

print mac 
print ip

給你：

[u'zz:zz:zz:zz:zz:ce', u'zz:zz:zz:zz:zz:cf'] 
[u'10.0.0.244', u'10.0.0.245']

注：如果您解析更多的HTML，那麼你可能還需要先找到封閉<table>。

來源

2016-03-07 08:52:48

-2

try: 
    table = soup.find('table') 
except AttributeError as e: 
    print 'No tables found, exiting' 
    return 1 

# Get rows 
try: 
    rows = table.find_all('tr') 
except AttributeError as e: 
    print 'No table rows found, exiting' 
    return 1

來源

2016-03-07 08:53:25

請添加一些關於您的解決方案的意見，爲什麼以及如何解決問題 –

正則表達式/ beautifulsoup如何從html表中提取列的所有值？

回答

相關問題