BeautifulSoup：如何提取特定的html標記後的數據

我有以下的HTML，我想弄清楚我究竟可以告訴BeautifulSoup提取特定的html元素後td。在這種情況下，我想以後BeautifulSoup：如何提取特定的html標記後的數據

<tr> 
<td> Color Digest </td> 
<td> 2,36,156,38,25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, </td> 
</tr>

這是整個HTML

<html> 
<head> 
<body> 
<div align="center"> 
<table cellspacing="0" cellpadding="0" style="clear:both; width:100%;margin:0px; font-size:1pt;"> 
<br> 
<br> 
<table> 
<table> 
<tbody> 
<tr bgcolor="#AAAAAA"> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<tr> 
<td> Color Digest </td> 
<td> 2,36,156,38,25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, </td> 
</tr> 
</tbody> 
</table>

來源

2012-07-23 Null-Hypothesis

這是所有的HTML嗎？或者它是在一個更大的文件與許多其他s和 s？並且保證在你正在解析的html中只有一個「Color Digest」元素？ – 2012-07-23 18:49:33

不，這只是html的一個片段，所以我想實際上得到獲取元素後的某個元素的機制。像在XPath中，你可以告訴我需要第一個TD後顏色摘要 – 2012-07-23 20:22:21

獲得在<td>數據聽起來像是你需要遍歷的<td>名單，並停止一旦你找到了你的數據。

例子：

from BeautifulSoup import BeautifulSoup 

soup = BeautifulSoup('<html><tr><td>X</td><td>Color Digest</td><td>THE DIGEST</td></tr></html>') 
for cell in soup.html.tr.findAll('td'): 
    if 'Color Digest' == cell.text: 
     print cell.nextSibling.text

來源

2012-07-23 18:51:21 grammar31

BeautifulSoup：如何提取特定的html標記後的數據

回答

相關問題