獲取特定數據與BeautifulSoup

我以前page.prettify()收拾HTML，這是我現在想提取文本：獲取特定數據與BeautifulSoup

 <div class="item"> 
     <b> 
      name 
     </b> 
     <br/> 
     stuff here 
     </div>

我的目標是提取從那裏stuff here，但我難倒因爲它沒有被包裹在除div之外的任何標籤中，其中已包含其他內容。而且每條線前面的額外空白都會使得它更難。

這樣做的方法是什麼？

來源

2012-05-26 Markum

find和nextSibling的組合適用於您發佈的示例。

soup = BeautifulSoup(""" <div class="item"> <b> name </b> <br/> stuff here </div>""") 
soup.find("div", "item").find('br').nextSibling

來源

2012-05-26 17:07:01 ditkin

您可以使用div元素的.contents屬性直接獲取其中的所有元素，然後挑出一個字符串。

編輯：

這是我在影射方法：

from bs4 import BeautifulSoup 
from bs4.element import NavigableString 

soup = BeautifulSoup("""<div class='item'> <b> name </b> <br/> stuff here </div>""") 
div = soup.find('div') 
print ''.join([el.strip() for el in div.contents if type(el) == NavigableString])

來源

2012-05-26 16:11:45 Acorn

這沒有什麼幫助，它會簡單地返回一個帶有第一項的列表作爲整個內容。 – subiet

我不確定你的意思，它適用於我。 – Acorn

如果你真的確定，你要拿起內容之前的最後一次剛結束後的特定開始標記，您可以在此之後使用RegExp，而不是最優雅的，但如果您的要求是特定的，則可能有效。

來源

2012-05-26 16:54:59 subiet

獲取特定數據與BeautifulSoup

回答

相關問題