如何取消所有p標籤內的所有文本，包括span中的文本？

table = soup.findAll('div', attrs={"class":"five columns"}) 
for data in table: 
    para = data.findAll('p') 
    print para

這就是我還剩下的。如何取消所有p標籤內的所有文本，包括span中的文本？

<p><span class="four">Location: </span> <span id="wt-loc" title="New Delhi/Safdarjung">New Delhi/Safdarjung</span></p>, <p><span class="four">Current Time: </span> <span id="wtct">Feb 12, 2017 at 10:29:52 am</span></p>, <p><span class="four">Latest Report: </span> Feb 12, 2017 at 8:30 am</p>, <p><span class="four">Visibility: </span> 1 km</p>, <p><span class="four">Pressure: </span> 102.12 kPa</p>, <p><span class="four">Humidity: </span> 95%</p>, <p><span class="four">Dew Point: </span> 10 °C</p>

來源

2017-02-12 DeeJay

您可以用BeautifulSoup對象para.text的.text屬性嘗試。我進一步re.split()功能分裂密鑰對值，如果你不想分裂，然後就去做para.text

from bs4 import BeautifulSoup 
import re 

a = """<p><span class="four">Location: </span> <span id="wt-loc" title="New Delhi/Safdarjung">New Delhi/Safdarjung</span></p>, <p><span class="four">Current Time: </span> <span id="wtct">Feb 12, 2017 at 10:29:52 am</span></p>, <p><span class="four">Latest Report: </span> Feb 12, 2017 at 8:30 am</p>, <p><span class="four">Visibility: </span> 1 km</p>, <p><span class="four">Pressure: </span> 102.12 kPa</p>, <p><span class="four">Humidity: </span> 95%</p>, <p><span class="four">Dew Point: </span> 10 °C</p>""" 

soup = BeautifulSoup(a, 'html.parser') 
re.split(r', (?=\s*[A-Z])', soup.text)

輸出：

[u'Location: New Delhi/Safdarjung', 
u'Current Time: Feb 12, 2017 at 10:29:52 am', 
u'Latest Report: Feb 12, 2017 at 8:30 am', 
u'Visibility: 1 km', 
u'Pressure: 102.12 kPa', 
u'Humidity: 95%', 
u'Dew Point: 10 \uc9f8C']

來源

2017-02-12 05:27:52 MYGz

使用.text讓所有在p標籤的文本，你需要做的是遍歷findAll(p)

from bs4 import BeautifulSoup 
html = '''<p><span class="four">Location: </span> <span id="wt-loc" title="New Delhi/Safdarjung">New Delhi/Safdarjung</span></p>, <p><span class="four">Current Time: </span> <span id="wtct">Feb 12, 2017 at 10:29:52 am</span></p>, <p><span class="four">Latest Report: </span> Feb 12, 2017 at 8:30 am</p>, <p><span class="four">Visibility: </span> 1 km</p>, <p><span class="four">Pressure: </span> 102.12 kPa</p>, <p><span class="four">Humidity: </span> 95%</p>, <p><span class="four">Dew Point: </span> 10 °C</p>''' 

soup = BeautifulSoup(html, 'lxml') 

for p in soup.find_all('p'): 
    print(p.text)

出來：

Location: New Delhi/Safdarjung 
Current Time: Feb 12, 2017 at 10:29:52 am 
Latest Report: Feb 12, 2017 at 8:30 am 
Visibility: 1 km 
Pressure: 102.12 kPa 
Humidity: 95% 
Dew Point: 10 °C

來源

2017-02-12 07:02:32

美麗的湯有一個叫做get_text()的功能，它允許您在忽略其他標籤的標籤內獲得所有文本。請致電p.get_text()。如果您還想刪除空白區域電話p.get_text(strip=True)。

來源

2017-02-13 16:42:56

如何取消所有p標籤內的所有文本，包括span中的文本？

回答

相關問題