如何使用BeautifulSoup刪除嵌套標記中的內容？

如何刪除嵌套標記中的內容BeautifulSoup？這些職位表現出相反的檢索中嵌套的標籤內容：How to get contents of nested tag using BeautifulSoup，並BeautifulSoup: How do I extract all the <li>s from a list of <ul>s that contains some nested <ul>s?如何使用BeautifulSoup刪除嵌套標記中的內容？

我試圖.text，但它僅刪除標籤

>>> from bs4 import BeautifulSoup as bs 
>>> html = "<foo>Something something <bar> blah blah</bar> something</foo>" 
>>> bs(html).find_all('foo')[0] 
<foo>Something something <bar> blah blah</bar> something else</foo> 
>>> bs(html).find_all('foo')[0].text 
u'Something something blah blah something else'

所需的輸出：

東西什麼東西否則

來源

2014-02-13 alvas

那麼......在這個例子中，你想刪除'bar'的內容嗎？ –

在第二行代碼中是否應該有「else」？ –

您可以檢查bs4.element.NavigableString兒童：

from bs4 import BeautifulSoup as bs 
import bs4 
html = "<foo>Something something <bar> blah blah</bar> something <bar2>GONE!</bar2> else</foo>" 
def get_only_text(elem): 
    for item in elem.children: 
     if isinstance(item,bs4.element.NavigableString): 
      yield item 

print ''.join(get_only_text(bs(html).find_all('foo')[0]))

輸出;

Something something something else

來源

2014-02-13 15:39:29

例如，

body = bs(html) 
for tag in body.find_all('bar'): 
    tag.replace_with('')

來源

2014-02-13 14:53:11

如何使用BeautifulSoup刪除嵌套標記中的內容？

回答

相關問題