評論是節點,象別的:
from BeautifulSoup import BeautifulSoup
from BeautifulSoup import Comment
from BeautifulSoup import NavigableString
text = BeautifulSoup("""<!--comment--><div>a</div><div>b</div><div>c</div>
<!--comment--><div>a</div><div>b</div><div>c</div>""")
comments = text.findAll(text=lambda elm: isinstance(elm, Comment))
for comment in comments:
next_sib = comment.nextSibling
while not isinstance(next_sib, Comment) and \
not isinstance(next_sib, NavigableString) and next_sib:
# This prints each sibling while it isn't whitespace or another comment
# Append next_sib to a list, dictionary, etc, etc and
# do what you want with it
print next_sib
next_sib = next_sib.nextSibling
編輯:(?註釋文本)
它不會檢測相同的意見,但你可以解決,通過如果註釋文本檢查與之前的評論塊相同。
謝謝。事實證明,如果你事先知道評論的文字,你可以這樣做: def right_comment(e): \t return isinstance(e,Comment)and e =='comment text' deal_comments = page.findAll (文字= right_comment) – raven001 2011-04-06 23:23:30