2
我正在處理帶有子標記的HTML元素,我想要「忽略」或刪除它,以便文本仍然存在。剛纔,如果我嘗試.string
任何帶有標籤的元素,我所得到的全部是None
。如何獲取美麗的湯元素的.string時忽略標籤?
import bs4
soup = bs4.BeautifulSoup("""
<div id="main">
<p>This is a paragraph.</p>
<p>This is a paragraph <span class="test">with a tag</span>.</p>
<p>This is another paragraph.</p>
</div>
""")
main = soup.find(id='main')
for child in main.children:
print child.string
輸出:
This is a paragraph.
None
This is another paragraph.
我想第二行是This is a paragraph with a tag.
。我該怎麼做呢?