BeautifulSoup未提取特定標記文本

我在使用BeautifulSoup收集特定標記的信息時遇到問題。我想在標籤html之間提取'Item 4'的文本，但下面的代碼獲取與'Item 1'相關的文本。我在做什麼不正確（例如，切片）？BeautifulSoup未提取特定標記文本

代碼：

primary_detail = page_section.findAll('div', {'class': 'detail-item'}) 
for item_4 in page_section.find('h3', string='Item 4'): 
    if item_4: 
    for item_4_content in page_section.find('html'): 
     print (item_4_content)

HTML：

<div class="detail-item"> 
    <h3>Item 1</h3> 
    <html><body><p>Item 1 text here</p></body></html> 
</div> 

<div class="detail-item"> 
    <h3>Item 2</h3> 
    <html><body><p>Item 2 text here</p></body></html> 
</div> 

<div class="detail-item"> 
    <h3>Item 3</h3> 
    <html><body><p>Item 3 text here</p></body></html> 
</div> 

<div class="detail-item"> 
    <h3>Item 4</h3> 
    <html><body><p>Item 4 text here</p></body></html> 
</div>

來源

2017-04-24 Life is complex

看起來你要根據<h3>文本值打印<p>標籤內容，正確嗎？

你的代碼必須：

負荷html_source
搜索所有'div'標籤包含'class'等於'detail-item'
在每種情況下，如果<h3>標籤的.text值等於字符串'Item 4'
然後代碼將print.text值的對應ng <p>標記

您可以使用以下代碼實現您想要的功能。

代碼：

s = '''<div class="detail-item"> 
    <h3>Item 1</h3> 
    <html><body><p>Item 1 text here</p></body></html> 
</div> 

<div class="detail-item"> 
    <h3>Item 2</h3> 
    <html><body><p>Item 2 text here</p></body></html> 
</div> 

<div class="detail-item"> 
    <h3>Item 3</h3> 
    <html><body><p>Item 3 text here</p></body></html> 
</div> 

<div class="detail-item"> 
    <h3>Item 4</h3> 
    <html><body><p>Item 4 text here</p></body></html> 
</div>''' 

from bs4 import BeautifulSoup 

soup = BeautifulSoup(s, 'lxml') 

primary_detail = soup.find_all('div', {'class': 'detail-item'}) 

for tag in primary_detail: 
    if 'Item 4' in tag.h3.text: 
     print(tag.p.text)

輸出：

'Item 4 text here'

編輯：在provided website第一循環occurence沒有<h3>標籤，只有一個<h2>所以它不會有任何.text值，正確？

您可以使用try/except條款，如下面的代碼繞過這個錯誤..

代碼：

from bs4 import BeautifulSoup 
import requests 


url = 'https://fortiguard.com/psirt/FG-IR-17-097' 
html_source = requests.get(url).text 

soup = BeautifulSoup(html_source, 'lxml') 

primary_detail = soup.find_all('div', {'class': 'detail-item'}) 

for tag in primary_detail: 
    try: 
     if 'Solutions' in tag.h3.text: 
      print(tag.p.text) 
    except: 
     continue

如果代碼面臨的一個例外，它會繼續與迭代循環中的下一個元素。所以代碼將忽略不包含任何.text值的第一項。

輸出：

'Upgrade to FortiWLC-SD version 8.3.0'

來源

2017-04-24 16:40:10

我接收此錯誤：AttributeError的： 'NoneType' 對象沒有屬性 '文本'，其被鏈接到這個 - tag.h3.text。 –

你是如何加載html_source的？在我的例子中，我使用了你提供的源代碼......但是在一個真正的問題中，你可以使用's = requests.get（url）之類的東西。文本「來加載html源碼 –

是的，我正在刮一個真正的頁面。我可以從位於div class =「detail-item」標籤內的h2標籤中提取文本，但不能提取h3標籤下的文本。這裏是我用於獲取頁面內容的一行 - itemSoupParser = BeautifulSoup（raw_html，'html.parser'）。我能夠從頁面中獲取所有內容，但h3文本內容除外。 –

BeautifulSoup未提取特定標記文本

回答

相關問題