所以,用於測試目的,讓我們假設該段HTML是span
標籤中:
x = """<span><br />
Important Text 1
<br />
<br />
Not Important Text
<br />
Important Text 2
<br />
Important Text 3
<br />
<br />
Non Important Text
<br />
Important Text 4
<br /></span>"""
現在我要分析它,並找到我的跨度標籤:
from BeautifulSoup import BeautifulSoup
y = soup.find('span')
如果您遍歷在y.childGenerator()
發電機,你會得到br和文本:
In [4]: for a in y.childGenerator(): print type(a), str(a)
....:
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
Important Text 1
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
Not Important Text
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
Important Text 2
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
Important Text 3
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
Non Important Text
<type 'instance'> <br />
<class 'BeautifulSoup.NavigableString'>
Important Text 4
<type 'instance'> <br />
啊,問題是我是用findNextSibling(),以及剛跳過文本並進入下一個換行符。使用nextSibling工作。謝謝您的幫助! – maltman 2011-03-14 15:22:29
很好的回答,這讓我很頭疼! – Nick 2013-07-24 01:58:41
'next'不是Python中的保留字嗎?也許不同的變量名會更好? (這是一個小點,但這樣的東西加起來!) – duhaime 2013-10-18 02:20:50