2013-01-12 110 views
4

這是我的項目:我使用RRDTool繪製WeatherBug的天氣數據。我需要一個簡單而有效的方式從WeatherBug下載天氣數據。我正在使用非常低效的bash腳本刮板,但轉移到了BeautifulSoup。性能太慢(它運行在Raspberry Pi上),所以我需要使用LXML。如何在Python和LXML中解析XML?

我有什麼至今:

from lxml import etree 
doc=etree.parse('weather.xml') 
print doc.xpath("//aws:weather/aws:ob/aws:temp") 

但我得到一個錯誤信息。 Weather.xml是這樣的:

<?xml version="1.0" encoding="UTF-8"?> 

<aws:weather xmlns:aws="http://www.aws.com/aws"> 
    <aws:api version="2.0"/> 
    <aws:WebURL>http://weather.weatherbug.com/PA/Tunkhannock-weather.html?ZCode=Z5546&amp;Units=0&amp;stat=TNKCN</aws:WebURL> 
    <aws:InputLocationURL>http://weather.weatherbug.com/PA/Tunkhannock-weather.html?ZCode=Z5546&amp;Units=0</aws:InputLocationURL> 
    <aws:ob> 
    <aws:ob-date> 
     <aws:year number="2013"/> 
     <aws:month number="1" text="January" abbrv="Jan"/> 
     <aws:day number="11" text="Friday" abbrv="Fri"/> 
     <aws:hour number="10" hour-24="22"/> 
     <aws:minute number="26"/> 
     <aws:second number="00"/> 
     <aws:am-pm abbrv="PM"/> 
     <aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/> 
    </aws:ob-date> 
    <aws:requested-station-id/> 
    <aws:station-id>TNKCN</aws:station-id> 
    <aws:station>Tunkhannock HS</aws:station> 
    <aws:city-state zipcode="18657">Tunkhannock, PA</aws:city-state> 
    <aws:country>USA</aws:country> 
    <aws:latitude>41.5663871765137</aws:latitude> 
    <aws:longitude>-75.9794464111328</aws:longitude> 
    <aws:site-url>http://www.tasd.net/highschool/index.cfm</aws:site-url> 
    <aws:aux-temp units="&amp;deg;F">-100</aws:aux-temp> 
    <aws:aux-temp-rate units="&amp;deg;F">0</aws:aux-temp-rate> 
    <aws:current-condition icon="http://deskwx.weatherbug.com/images/Forecast/icons/cond013.gif">Cloudy</aws:current-condition> 
    <aws:dew-point units="&amp;deg;F">40</aws:dew-point> 
    <aws:elevation units="ft">886</aws:elevation> 
    <aws:feels-like units="&amp;deg;F">41</aws:feels-like> 
    <aws:gust-time> 
     <aws:year number="2013"/> 
     <aws:month number="1" text="January" abbrv="Jan"/> 
     <aws:day number="11" text="Friday" abbrv="Fri"/> 
     <aws:hour number="12" hour-24="12"/> 
     <aws:minute number="18"/> 
     <aws:second number="00"/> 
     <aws:am-pm abbrv="PM"/> 
     <aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/> 
    </aws:gust-time> 
    <aws:gust-direction>NNW</aws:gust-direction> 
    <aws:gust-direction-degrees>323</aws:gust-direction-degrees> 
    <aws:gust-speed units="mph">17</aws:gust-speed> 
    <aws:humidity units="%">98</aws:humidity> 
    <aws:humidity-high units="%">100</aws:humidity-high> 
    <aws:humidity-low units="%">61</aws:humidity-low> 
    <aws:humidity-rate>3</aws:humidity-rate> 
    <aws:indoor-temp units="&amp;deg;F">77</aws:indoor-temp> 
    <aws:indoor-temp-rate units="&amp;deg;F">-1.1</aws:indoor-temp-rate> 
    <aws:light>0</aws:light> 
    <aws:light-rate>0</aws:light-rate> 
    <aws:moon-phase moon-phase-img="http://api.wxbug.net/images/moonphase/mphase01.gif">0</aws:moon-phase> 
    <aws:pressure units="&quot;">30.09</aws:pressure> 
    <aws:pressure-high units="&quot;">30.5</aws:pressure-high> 
    <aws:pressure-low units="&quot;">30.08</aws:pressure-low> 
    <aws:pressure-rate units="&quot;/h">-0.01</aws:pressure-rate> 
    <aws:rain-month units="&quot;">0.11</aws:rain-month> 
    <aws:rain-rate units="&quot;/h">0</aws:rain-rate> 
    <aws:rain-rate-max units="&quot;/h">0.12</aws:rain-rate-max> 
    <aws:rain-today units="&quot;">0.09</aws:rain-today> 
    <aws:rain-year units="&quot;">0.11</aws:rain-year> 
    <aws:temp units="&amp;deg;F">41</aws:temp> 
    <aws:temp-high units="&amp;deg;F">42</aws:temp-high> 
    <aws:temp-low units="&amp;deg;F">29</aws:temp-low> 
    <aws:temp-rate units="&amp;deg;F/h">-0.9</aws:temp-rate> 
    <aws:sunrise> 
     <aws:year number="2013"/> 
     <aws:month number="1" text="January" abbrv="Jan"/> 
     <aws:day number="11" text="Friday" abbrv="Fri"/> 
     <aws:hour number="7" hour-24="07"/> 
     <aws:minute number="29"/> 
     <aws:second number="53"/> 
     <aws:am-pm abbrv="AM"/> 
     <aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/> 
    </aws:sunrise> 
    <aws:sunset> 
     <aws:year number="2013"/> 
     <aws:month number="1" text="January" abbrv="Jan"/> 
     <aws:day number="11" text="Friday" abbrv="Fri"/> 
     <aws:hour number="4" hour-24="16"/> 
     <aws:minute number="54"/> 
     <aws:second number="19"/> 
     <aws:am-pm abbrv="PM"/> 
     <aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/> 
    </aws:sunset> 
    <aws:wet-bulb units="&amp;deg;F">40.802</aws:wet-bulb> 
    <aws:wind-speed units="mph">3</aws:wind-speed> 
    <aws:wind-speed-avg units="mph">1</aws:wind-speed-avg> 
    <aws:wind-direction>S</aws:wind-direction> 
    <aws:wind-direction-degrees>163</aws:wind-direction-degrees> 
    <aws:wind-direction-avg>SE</aws:wind-direction-avg> 
    </aws:ob> 
</aws:weather> 

我用http://www.xpathtester.com/test來測試我的xpath,它在那裏工作。但我得到的錯誤信息:

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "lxml.etree.pyx", line 2043, in lxml.etree._ElementTree.xpath (src/lxml/lxml.etree.c:47570) 
    File "xpath.pxi", line 376, in lxml.etree.XPathDocumentEvaluator.__call__ (src/lxml/lxml.etree.c:118247) 
    File "xpath.pxi", line 239, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:116911) 
    File "xpath.pxi", line 224, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:116728) 
lxml.etree.XPathEvalError: Undefined namespace prefix 

這是所有非常新的我 - Python和XML和LXML。我想要的只是觀察時間和溫度。

難道我的問題與這個aws有什麼關係:在所有內容前加前綴?那有什麼意思?

任何幫助你可以提供非常感謝!

回答

7

這個問題與所有「與前面的所有aws:前綴」有關;它是一個必須定義的名稱空間前綴。這是很容易實現,如:

print doc.xpath('//aws:weather/aws:ob/aws:temp', 
       namespaces={'aws': 'http://www.aws.com/aws'})[0].text 

需要對命名空間前綴之間的映射值在http://lxml.de/xpathxslt.html被記錄在案。

4

嘗試這樣:

from lxml import etree 
ns = etree.FunctionNamespace("http://www.aws.com/aws") 
ns.prefix = "aws" 
doc=etree.parse('weather.xml') 
print doc.xpath("//aws:weather/aws:ob/aws:temp")[0].text 

請參閱此鏈接:http://lxml.de/extensions.html

+0

我喜歡這個解決辦法,因爲我不必每次我打電話的doc.xpath時間通過命名空間映射()方法。 – jdhildeb