2014-06-26 53 views
3

我正在嘗試使用Yahoo Weather API,但我解析了API響應的XML的一些問題。我正在使用Python 3.4。下面是我與工作代碼:使用ElementTree和請求進行XML解析

weather_url = 'http://weather.yahooapis.com/forecastrss?w=%s&u=%s' 
url = weather_url % (zip_code, units) 

try: 
    rss = parse(requests.get(url, stream=True).raw).getroot() 

    conditions = rss.find('channel/item/{%s}condition' % weather_ns) 

    return { 
     'current_condition': conditions.get('text'), 
     'current_temp': conditions.get('temp'), 
     'title': rss.findtext('channel/title') 
    } 
except: 
    raise 

下面是我收到堆棧跟蹤:

Traceback (most recent call last): 
    File "<input>", line 1, in <module> 
    File "/home/jonathan/PycharmProjects/pyweather/pyweather/pyweather.py", line 42, in yahoo_conditions 
    rss = parse(requests.get(url, stream=True).raw).getroot() 
    File "/usr/lib/python3.4/xml/etree/ElementTree.py", line 1187, in parse 
    tree.parse(source, parser) 
    File "/usr/lib/python3.4/xml/etree/ElementTree.py", line 598, in parse 
    self._root = parser._parse_whole(source) 
    File "<string>", line None 
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0 

的xml.etree.ElementTree解析功能不喜歡請求庫返回的原始對象。尋找到它有點深,原始對象解析爲

>>> r = requests.get('http://weather.yahooapis.com/forecastrss?w=2502265', stream=True) 
>>> r.raw 
<requests.packages.urllib3.response.HTTPResponse object at 0x7f32c24f9e48> 

我引用this solution,但它仍然導致了同樣的問題。爲什麼上面的方法沒有工作? ElementTree.parse函數不支持urllib3響應對象嗎?我已經閱讀了所有的文檔,但他們根本沒有啓發我。

的文檔列表在這裏:

編輯: 更多的試驗後,我仍然沒有找到一個解決上面列出的問題。但是,我找到了解決方法。如果您在XML內容中使用ElementTree的fromstring方法,則一切正常。

def fetch_xml(url): 
    """ 
    Fetch a url and parse the document's XML. 

    :param url: the URL that the XML is located at. 
    :return: the root element of the XML. 
    :raises: 
     :requests.exceptions.RequestException: Requests could not open the URL. 
     :xml.etree.ElementTree.ParseError: xml.etree.ElementTree failed to parse the XML document. 
    """ 

    return ET.fromstring(requests.get(url).content) 

我猜這種方法的缺點是它使用更多的內存。你怎麼看?我想獲得社區意見。

回答

0

如果您對XML內容使用ElementTree的fromstring方法,則一切正常。

def fetch_xml(url): 
    """ 
    Fetch a url and parse the document's XML. 

    :param url: the URL that the XML is located at. 
    :return: the root element of the XML. 
    :raises: 
     :requests.exceptions.RequestException: Requests could not open the URL. 
     :xml.etree.ElementTree.ParseError: xml.etree.ElementTree failed to parse the XML document. 
    """ 

    return ET.fromstring(requests.get(url).content) 

我猜這種方法的缺點是它使用更多的內存。

1

你爲什麼要使用請求下載一些RSS XML數據的流?你想一直保持連接嗎?天氣幾乎沒有變化,所以爲什麼不每5分鐘輪詢一次服務呢?

下面是使用BeautifulSoup和請求進行輪詢和解析的完整代碼。簡短而甜美。

import requests 
from bs4 import BeautifulSoup 

r = requests.get('http://weather.yahooapis.com/forecastrss?w=%s&u=%s' % (2459115, "c")) 
if r.status_code == 200: 
    soup = BeautifulSoup(r.text) 
    print("Current condition: ", soup.find("description").string) 
    print("Temperature: ", soup.find('yweather:condition')['temp']) 
    print("Title: ", soup.find("title").string) 
else: 
    r.raise_for_status() 

輸出:

Current condition: Yahoo! Weather for New York, NY 
Temperature: 28 
Title: Yahoo! Weather - New York, NY 

有很多更可以用做Beautifulsoup。查閱其出色的文檔。

+0

我有'stream = True',因爲API要求它獲取原始數據。查看文檔[here](http://docs.python-requests.org/en/latest/api/#requests.Response.raw)。我的第二個解決方案更優雅,功能更強大。感謝所提供的圖書館,但我將不得不看看! – Jonathan