2012-07-23 96 views
5

我有以下的HTML,我想弄清楚我究竟可以告訴BeautifulSoup提取特定的html元素後td。在這種情況下,我想以後BeautifulSoup:如何提取特定的html標記後的數據

<tr> 
<td> Color Digest </td> 
<td> 2,36,156,38,25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, </td> 
</tr> 

這是整個HTML

<html> 
<head> 
<body> 
<div align="center"> 
<table cellspacing="0" cellpadding="0" style="clear:both; width:100%;margin:0px; font-size:1pt;"> 
<br> 
<br> 
<table> 
<table> 
<tbody> 
<tr bgcolor="#AAAAAA"> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<td> Color Digest </td> 
<td> 2,36,156,38,25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, </td> 
</tr> 
</tbody> 
</table> 
+1

這是所有的HTML嗎?或者它是在一個更大的文件與許多其他​​s和 s?並且保證在你正在解析的html中只有一個「Color Digest」元素? – 2012-07-23 18:49:33

+0

不,這只是html的一個片段,所以我想實際上得到獲取元素後的某個元素的機制。像在XPath中,你可以告訴我需要第一個TD後​​顏色摘要 – 2012-07-23 20:22:21

回答

4

獲得在<td>數據聽起來像是你需要遍歷的<td>名單,並停止一旦你找到了你的數據。

例子:

from BeautifulSoup import BeautifulSoup 

soup = BeautifulSoup('<html><tr><td>X</td><td>Color Digest</td><td>THE DIGEST</td></tr></html>') 
for cell in soup.html.tr.findAll('td'): 
    if 'Color Digest' == cell.text: 
     print cell.nextSibling.text 
相關問題