我很迷惑我如何使用帶有BeautifulSoup的ResultSet對象,即bs4.element.ResultSet
。BeautifulSoup,提取HTML標籤內的字符串,ResultSet對象
使用find_all()
後,如何提取文本?
實施例:
在bs4
文檔,HTML文檔html_doc
看起來像:
<p class="story">
Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">
Elsie
</a>
,
<a class="sister" href="http://example.com/lacie" id="link2">
Lacie
</a>
and
<a class="sister" href="http://example.com/tillie" id="link2">
Tillie
</a>
; and they lived at the bottom of a well.
</p>
One開始通過創建soup
和查找所有href
,
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
soup.find_all('a')
其輸出
[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
我們也可以做
for link in soup.find_all('a'):
print(link.get('href'))
其輸出
http://example.com/elsie
http://example.com/lacie
http://example.com/tillie
我想從class_="sister"
得到僅文本,即
Elsie
Lacie
Tillie
一個可以嘗試
for link in soup.find_all('a'):
print(link.get_text())
但這會導致一個錯誤:
AttributeError: 'ResultSet' object has no attribute 'get_text'
完美的作品,謝謝。我很困惑,因爲「sistertags.text」正在拋出一個錯誤 – ShanZhengYang