使用BeautifulSoup獲取沒有標籤的文字？

我一直在使用BeautifulSoup解析一個HTML文檔，似乎遇到了問題。我發現了一些我需要提取的文本，但文本很簡單。沒有標籤或任何東西。我不確定是否需要使用正則表達式來完成此操作，因爲我不知道是否可以使用BeautifulSoup抓取文本，因爲它不包含任何標記。使用BeautifulSoup獲取沒有標籤的文字？

<strike style="color: #777777">975</strike> 487 RP<div class="gs-container default-2-col">

我試圖提取「487」。

謝謝！

來源

2015-06-14 codsane

您可以使用上一個或下一個標籤作爲錨點以查找文本。例如，查找<strike>元素，然後再得到它旁邊的文本節點：

from bs4 import BeautifulSoup 

html = """<strike style="color: #777777">975</strike> 487 RP<div class="gs-container default-2-col">""" 
soup = BeautifulSoup(html) 

#find <strike> element first, then get text element next to it 
result = soup.find('strike',{'style': 'color: #777777'}).findNextSibling(text=True) 

print(result.encode('utf-8')) 
#output : ' 487 RP' 
#you can then do simple text manipulation/regex to clean up the result

_{注意，上面的代碼是用於演示的目的，不是爲了實現整個任務。}

來源

2015-06-14 02:14:42 har07

使用BeautifulSoup獲取沒有標籤的文字？

回答

相關問題