Python-HTML-如何使用BeautifulSoup去除標籤之間的內容

我在做什麼：我正在寫一個網頁提取器來收集天氣數據。這是我做過什麼至今：Python-HTML-如何使用BeautifulSoup去除標籤之間的內容

import urllib.request 
from bs4 import BeautifulSoup 

# open the webpage and assign the content to a new variable 
base = urllib.request.urlopen('http://www.weather.com/weather/today/Beijing+CHXX0008:1:CH') 
f = base.readlines() 
f = str(f) 


soup = BeautifulSoup(f) 

rn_base = soup.find_all(itemprop="temperature-fahrenheit")

如果print變量rn_base，您將獲得：[<span class="wx-value" itemprop="temperature-fahrenheit">75</span>]，我想這是隻有一個元素的列表。數字75是我的目標。

問題：我嘗試了幾種方法來獲取數字，但失敗了。它們是：1）使用str.join()將rn_base轉換爲字符串，但因爲rn_base是ResultSet對象而失敗; 2）使用索引切片，但因爲它不是字符串主題，失敗。 3）按照beautifulsoup documentation的規定使用get_text()，但得到AttributeError: 'ResultSet' object has no attribute 'get_text'。

任何幫助，非常感謝！

來源

2013-07-19 hakuna121

rn_base是的resultSet對象，以便即使結果僅是一個它假定可以有很多result.So，

for rn in rn_base 
Print rn.string

這種用於循環將提取每個線形成的結果（當他們有「溫度華氏」）

正如你說你想的天氣數據，我認爲這是更好地使用find()與限制比find_all()

來源

2013-07-19 02:42:23

謝謝你的多次出現！被'resultset'類困惑了。 – hakuna121

Python-HTML-如何使用BeautifulSoup去除標籤之間的內容

回答

相關問題