Python/BeautifulSoup：如何直接看代碼評論下方？

我解析某些網頁與BeautifulSoup，並試圖在庫中工作（而不是試圖解決與蠻力強行正則表達式的一切..）Python/BeautifulSoup：如何直接看代碼評論下方？

我在看的結構是這樣的頁面：

<!--comment--> 
<div>a</div> 
<div>b</div> 
<div>c</div> 
<!--comment--> 
<div>a</div> 
<div>b</div 
<div>c</div

我想分別解析每個部分。有沒有辦法告訴美麗的人來分解相同評論之間的區域？

謝謝！

來源

2011-04-03 raven001

評論是節點，象別的：

from BeautifulSoup import BeautifulSoup 
from BeautifulSoup import Comment 
from BeautifulSoup import NavigableString 

text = BeautifulSoup("""<!--comment--><div>a</div><div>b</div><div>c</div> 
         <!--comment--><div>a</div><div>b</div><div>c</div>""") 

comments = text.findAll(text=lambda elm: isinstance(elm, Comment)) 
for comment in comments: 
    next_sib = comment.nextSibling 
    while not isinstance(next_sib, Comment) and \ 
     not isinstance(next_sib, NavigableString) and next_sib: 
     # This prints each sibling while it isn't whitespace or another comment 
     # Append next_sib to a list, dictionary, etc, etc and 
     # do what you want with it 
     print next_sib 
     next_sib = next_sib.nextSibling

編輯：（？註釋文本）

它不會檢測相同的意見，但你可以解決，通過如果註釋文本檢查與之前的評論塊相同。

來源

2011-04-03 08:03:00

我沒有看到用於直接在BeautifulSoup中獲得評論節點的高級API。相反，你必須自己走過分析樹。

見1

的例子告訴你，你可以檢查節點「評論」類......這就是你了。

另一個嚇人的想法：

您可能使用soup.prettify（）再現由線文檔LINIE然後解析由線所產生的輸出線，檢查評論和再次手動送入以下行到BeautifulSoup。

來源

2011-04-03 06:32:17

謝謝。事實證明，如果你事先知道評論的文字，你可以這樣做： def right_comment（e）： \t return isinstance（e，Comment）and e =='comment text' deal_comments = page.findAll （文字= right_comment） – raven001 2011-04-06 23:23:30

Python/BeautifulSoup：如何直接看代碼評論下方？

回答

相關問題