2012-11-02 36 views
1

long xml document我試圖得到一些屬性。具體而言,我期望獲得階段cfs和ft級別,該代碼可靠地完成。困難的是,我似乎無法弄清楚如何提取時間戳從類似標籤的日期時間值:如何從這個XML標籤獲得時間?

<ns1:value qualifiers="P" dateTime="2012-11-01T18:45:00.000-05:00">54800</ns1:value> 

任何幫助和改進建議是極大的讚賞。

def getLevels(gaugeId): 

    # create url string 00060=cfs and 00065=ft 
    urlRoot = "http://waterservices.usgs.gov/nwis/iv/?format=waterml,1.1&sites=" 
    urlTail = "&parameterCd=00060,00065" 
    url = urlRoot + str(gaugeId) + urlTail 
    del urlRoot, urlTail 

    # open connection to url 
    urlFile = urllib2.urlopen(url) 

    # convert urlFile to string data: 
    urlData = urlFile.read() 

    # close file to release memory 
    urlFile.close() 

    # parse downloaded xml 
    domData = parseString(urlData) 

    # extract xml element values for stage cfs and ft 
    index = 0 
    elementCount = domData.getElementsByTagName("ns1:value").length 
    output = [] 
    while elementCount >= index: 
     xmlString = domData.getElementsByTagName("ns1:value")[index].toxml() 
     output.append(stripXmlTags(xmlString)) 
     index = index + 1 

    # extract and return 
    return output 

回答

0

國際熱核實驗堆()的ElementTree的方法也很方便的得到一些的您需要的數據,如下所示。程序後面有一些輸出示例。

#!/usr/bin/env python 
from xml.etree import cElementTree as ET 
from datetime import datetime 
import re 

with open('waterservices.usgs.gov.xml','r') as fi: 
    waterData = ''.join(fi.readlines()) 
waterData = re.sub('ns[12]:', '', waterData) 
root = ET.fromstring(waterData) 
dates = [v.get('dateTime') for v in root.iter('value')] 
valus = [float(v.text) for v in root.iter('value')] 
units = [v.text for v in root.iter('variableName')] 
print 'valus', valus 
print 'units', units 
print 'dates', dates 
dates = [datetime.strptime(t[:-6], '%Y-%m-%dT%H:%M:%S.%f') for t in dates] 
print 'dates', dates 
a = zip (valus, units, dates) 
for v in a: 
    print v 

(注意,我不知道如何正確處理前綴ns1:ns2:,所以在上面已經通過re.sub壓制他們。數據是從文件採取簡潔上面的演示代碼,而不是下面的示例輸出基於XML數據文件 link from question,保存爲本地文件waterservices.usgs.gov.xml

valus [53200.0, 6.86] 
units ['Streamflow, ft&#179;/s', 'Gage height, ft'] 
dates ['2012-11-01T19:45:00.000-05:00', '2012-11-01T19:45:00.000-05:00'] 
dates [datetime.datetime(2012, 11, 1, 19, 45), datetime.datetime(2012, 11, 1, 19, 45)] 
(53200.0, 'Streamflow, ft&#179;/s', datetime.datetime(2012, 11, 1, 19, 45)) 
(6.86, 'Gage height, ft', datetime.datetime(2012, 11, 1, 19, 45)) 
0

您可以以此爲起點 - 請注意,目前忽略了時區...

from xml.etree import ElementTree as ET 

tree = ET.fromstring(urlData) 
for elem in tree.findall('.//{http://www.cuahsi.org/waterML/1.1/}value'): 
    print datetime.strptime(elem.attrib['dateTime'][:-10], '%Y-%m-%dT%H:%M:%S')