BeautifulSoup，提取HTML標籤內的字符串，ResultSet對象

我很迷惑我如何使用帶有BeautifulSoup的ResultSet對象，即bs4.element.ResultSet。BeautifulSoup，提取HTML標籤內的字符串，ResultSet對象

使用find_all()後，如何提取文本？

實施例：

在bs4文檔，HTML文檔html_doc看起來像：

<p class="story"> 
    Once upon a time there were three little sisters; and their names were 
    <a class="sister" href="http://example.com/elsie" id="link1"> 
    Elsie 
    </a> 
    , 
    <a class="sister" href="http://example.com/lacie" id="link2"> 
    Lacie 
    </a> 
    and 
    <a class="sister" href="http://example.com/tillie" id="link2"> 
    Tillie 
    </a> 
    ; and they lived at the bottom of a well. 
    </p>

One開始通過創建soup和查找所有href，

from bs4 import BeautifulSoup 
soup = BeautifulSoup(html_doc, 'html.parser') 
soup.find_all('a')

其輸出

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
    <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
    <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

我們也可以做

for link in soup.find_all('a'): 
    print(link.get('href'))

其輸出

http://example.com/elsie 
http://example.com/lacie 
http://example.com/tillie

我想從class_="sister"得到僅文本，即

Elsie 
Lacie 
Tillie

一個可以嘗試

for link in soup.find_all('a'): 
    print(link.get_text())

但這會導致一個錯誤：

AttributeError: 'ResultSet' object has no attribute 'get_text'

來源

2015-11-03 ShanZhengYang

請在class_='sister'一個find_all()過濾。

注：通知的class後強調。這是一個特例，因爲課是一個保留字。

It’s very useful to search for a tag that has a certain CSS class, but the name of the CSS attribute, 「class」, is a reserved word in Python. Using class as a keyword argument will give you a syntax error. As of Beautiful Soup 4.1.2, you can search by CSS class using the keyword argument class_ :

來源：http://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class

一旦你把所有帶班的妹妹標籤，呼籲他們.text來獲取文本。一定要去掉文字。

例如：

from bs4 import BeautifulSoup 

html_doc = '''<p class="story"> 
    Once upon a time there were three little sisters; and their names were 
    <a class="sister" href="http://example.com/elsie" id="link1"> 
    Elsie 
    </a> 
    , 
    <a class="sister" href="http://example.com/lacie" id="link2"> 
    Lacie 
    </a> 
    and 
    <a class="sister" href="http://example.com/tillie" id="link2"> 
    Tillie 
    </a> 
    ; and they lived at the bottom of a well. 
    </p>''' 

soup = BeautifulSoup(html_doc, 'html.parser') 
sistertags = soup.find_all(class_='sister') 
for tag in sistertags: 
    print tag.text.strip()

輸出：

(bs4)macbook:bs4 joeyoung$ python bs4demo.py 
Elsie 
Lacie 
Tillie

來源

2015-11-03 23:55:11

完美的作品，謝謝。我很困惑，因爲「sistertags.text」正在拋出一個錯誤 – ShanZhengYang

BeautifulSoup，提取HTML標籤內的字符串，ResultSet對象

回答

相關問題