如何打印純文本beautifulsoup

我努力學習beautifulsoup，以創建一個應用程序是如何工作的。如何打印純文本beautifulsoup

我能夠使用.find_all（）查找並打印所有元素，但它們也會打印html標籤。我如何才能打印這些標籤中的文字。

這是我有：

from bs4 import BeautifulSoup 

"""<html> 
<p>1</p> 
<p>2</p> 
<p>3</p> 
""" 

soup = BeautifulSoup(open('index.html'), "html.parser") 
i = soup.find_all('p') 
print i

來源

2017-02-02 snovosel

[使用BeautifulSoup提取文本沒有標籤（HTTP的可能重複：// stackoverflow.com/questions/23380171/using-beautifulsoup-extract-text-without-tags） – franklinsijo

@franklinsijo是的。我也在我的回答中將另一個相同的問題聯繫起來。 – Steampunkery

-1

soup = BeautifulSoup(open('index.html'), "html.parser") 
i = soup.find_all('p') 
for p in i: 
    print p.text

find_all()將返回標籤的列表，你應該遍歷並使用tag.text獲得標籤

更好的辦法下的文字：

for p in soup.find_all('p'): 
    print p.text

來源

2017-02-02 17:40:43

-1

我想你可以做他們在做的事this stackoverflow question。使用findAll(text=True)。因此，在你的代碼：

from bs4 import BeautifulSoup 

"""<html> 
<p>1</p> 
<p>2</p> 
<p>3</p> 
""" 

soup = BeautifulSoup(open('index.html'), "html.parser") 
i = soup.findAll(text=True) 
print i

來源

2017-02-02 17:44:52 Steampunkery

這將返回HTML代碼中的所有文本，包括註釋，這絕對不是解決方案 –

包括註釋？你的意思是包括評論？ – Steampunkery

'Comment'對象只是一個特殊類型的'NavigableString' –

這可能會幫助你： -

from bs4 import BeautifulSoup 
source_code = """<html> 
<p>1</p> 
<p>2</p> 
<p>3</p> 
""" 
soup = BeautifulSoup(source_code) 
print soup.text

輸出： -

1 
2 
3

來源

2017-02-03 05:58:45

如何打印純文本beautifulsoup

回答

相關問題