獲取標籤

之間的文本的多個塊這是我的HTML：獲取標籤

<div class="left_panel"> 
    <h4>Header1</h4> 
     block of text that I want.    
    <br /> 
    <br /> 
     another block of text that I want. 
    <br /> 
    <br /> 
     still more text that I want. 
    <br /> 
    <br /> 
     <p>&nbsp;</p> 
    <h4>Header2</h4>

文本塊的數量是可變的，Header1是一致的，Header2不是。

我用下面的代碼成功地提取文本的第一塊：

def get_summary (soup): 
raw = soup.find('div',{"class":"left_panel"}) 
for h4 in raw.findAllNext('h4'): 
    following = h4.nextSibling 
    return following

不過，我需要所有兩個h4標籤之間坐的項目，我希望用h4.nextSiblings會解決這個問題，但出於某種原因，返回以下錯誤：

TypeError: 'NoneType' object is not callable

我已經對這個答案試圖變化：Find next siblings until a certain one using beautifulsoup但由於沒有一個主導的Tag困惑我。

來源

2015-01-01 woodbine

找到的第一個標題和迭代.next_siblings，直到你遇到的另一頭：

from bs4 import BeautifulSoup 

data = """ 
<div class="left_panel"> 
    <h4>Header1</h4> 
     block of text that I want. 
    <br /> 
    <br /> 
     another block of text that I want. 
    <br /> 
    <br /> 
     still more text that I want. 
    <br /> 
    <br /> 
     <p>&nbsp;</p> 
    <h4>Header2</h4> 
</div> 
""" 

soup = BeautifulSoup(data) 
header1 = soup.find('h4', text='Header1') 
for item in header1.next_siblings: 
    if getattr(item, 'name') == 'h4' and item.text == 'Header2': 
     break 

    print item

更新（收集2個h4標籤之間的文本）：

texts = [] 
for item in header1.next_siblings: 
    if getattr(item, 'name') == 'h4' and item.text == 'Header2': 
     break 

    try: 
     texts.append(item.text) 
    except AttributeError: 
     texts.append(item) 

print ''.join(texts)

來源

2015-01-01 11:08:06 alecxe

謝謝Alecxe，這工作得很好。將這些字符串合併爲一個合理的項目的最佳方法是什麼？我目前使用append將它們添加到列表中，這似乎有點愚蠢。 – woodbine

@woodbine請參閱更新。基本上，它與您所說的相同 - 保留文本列表。希望有所幫助。 – alecxe

謝謝Alecxe，非常感謝。 – woodbine

我不明白你爲什麼通過soup作爲參數，但你不使用它。

如果您使用正確的湯實例，您不應該得到該錯誤。 findAllNext(h4)回報<h4>Header1</h4>和<h4>Header2</h4>，將每個nextSibling返回文本兄弟，這是

block of text that I want.

和

')

你的情況。

來源

2015-01-01 10:07:48 Maroun

對不起，是我不好，我在原始代碼中調用了tender_soup，但使用'湯'來簡化它。我忘了從隨後的電話中刪除'tender_'。已調整。 – woodbine

回答

相關問題