2012-01-18 184 views
1

我有一點腳本,我認爲幾乎在那裏。我已經寫出了一種粗略的寫法,但我無法弄清楚如何讓它作爲for循環來使用。蟒蛇xml提取循環

<Trackpoint> 
    <Time>2012-01-17T11:44:35Z</Time> 
    <Position> 
     <LatitudeDegrees>51.920211518183351</LatitudeDegrees> 
     <LongitudeDegrees>26.706042898818851</LongitudeDegrees> 
    </Position> 
    <AltitudeMeters>-43.6026611328125</AltitudeMeters> 
</Trackpoint> 
<Trackpoint> 
    <Time>2012-01-17T11:45:21Z</Time> 
    <Position> 
     <LatitudeDegrees>51.920243117958307</LatitudeDegrees> 
     <LongitudeDegrees>26.706140967085958</LongitudeDegrees> 
    </Position> 
    <AltitudeMeters>-43.6026611328125</AltitudeMeters> 
</Trackpoint> 

我可以使用以下方法來獲取說LatitudeDegrees:

from xml.dom.minidom import parse 
doc = parse('/Users/name/Documents/GPS/gps.tcx') 
lat = doc.getElementsByTagName("LatitudeDegrees") 
time = doc.getElementsByTagName("Time") 
trackpoint = doc.getElementsByTagName("Trackpoint") 

for x in lat: 
    print(x.firstChild.data) 

,但我想獲得

我從使用以下格式的XML文件中提取數據經緯度,長度和時間。

我猜我需要使用

for x in trackpoint 

,但只有我能工作,如何做到這一點的方法如下。

有沒有人有什麼想法?我想我只是錯過了一些非常簡單的事情!

回答

5

首先找到所有的Trackpoint元素並在其上循環。然後,在循環中找到每個Trackpoint元素的通緝childelements:

from xml.dom.minidom import parse 

doc = parse('in.tcx') 

trackpoints = doc.getElementsByTagName("Trackpoint") 
result = [] 
elements = ('Time', 'LatitudeDegrees', 'LongitudeDegrees') 
for tp in trackpoints: 
    obj = {} 
    for el in elements: 
     obj[el] = tp.getElementsByTagName(el)[0].firstChild.data 
    result.append(obj) 


print(result) 
+0

是導致列表和OBJ字典? – beoliver

+0

是的,它是'[{'Time':,'LatitudeDegrees':,'LongitudeDegrees':}]' –

+0

@ user969617,最終結果是一個字典列表。你可以通過改變'obj [el] ='直接打印結果。但是保持這種格式更靈活,然後創建一個單獨的函數輸出它。 –

0

也許你正在尋找zip

import xml.dom.minidom as minidom 
import os 

doc = minidom.parse(os.path.expanduser('~/test/gps.tcx')) 
latitudes = doc.getElementsByTagName("LatitudeDegrees") 
longitudes = doc.getElementsByTagName("LongitudeDegrees") 
time = doc.getElementsByTagName("Time") 
trackpoint = doc.getElementsByTagName("Trackpoint") 

for t,lat,lon in zip(time,latitudes,longitudes): 
    print(t.firstChild.data, lat.firstChild.data, lon.firstChild.data) 
+0

說實話,我不知道我需要什麼。我希望能夠保存輸出,然後將它們與.plist中的不同數據進行比較和合並。我看了拉鍊,因爲它看起來很有趣。我會使用's =「/Users/name/Documents/GPS/gps.tcx」''來獲取文件中的 – beoliver

2

我通常發現使用ElementTree更容易閱讀和例如XML解析您可以在三線

閱讀緯度
import xml.etree.ElementTree as etree 

s="""<root> 
<Trackpoint> 
    <Time>2012-01-17T11:44:35Z</Time> 
    <Position> 
     <LatitudeDegrees>51.920211518183351</LatitudeDegrees> 
     <LongitudeDegrees>26.706042898818851</LongitudeDegrees> 
    </Position> 
    <AltitudeMeters>-43.6026611328125</AltitudeMeters> 
</Trackpoint> 
<Trackpoint> 
    <Time>2012-01-17T11:45:21Z</Time> 
    <Position> 
     <LatitudeDegrees>51.920243117958307</LatitudeDegrees> 
     <LongitudeDegrees>26.706140967085958</LongitudeDegrees> 
    </Position> 
    <AltitudeMeters>-43.6026611328125</AltitudeMeters> 
</Trackpoint> 
</root> 
""" 

root = etree.fromstring(s) 
for point in root: 
    print point.find('Position/LatitudeDegrees').text 

所以假設你想給每個點轉換成字典

varnames = [ 
    ('Position/LatitudeDegrees', 'lat'), 
    ('Position/LongitudeDegrees', 'lon'), 
    ('Time', 'time'), 
    ('AltitudeMeters', 'alt') 
    ] 

points = [] 
for pointelem in etree.fromstring(s): 
    point = {} 
    for tag, varname in varnames: 
     point[varname] = pointelem.find(tag).text 
    points.append(point) 

import pprint 
pprint.pprint(points) 

輸出:

[{'alt': '-43.6026611328125', 
    'lat': '51.920211518183351', 
    'lon': '26.706042898818851', 
    'time': '2012-01-17T11:44:35Z'}, 
{'alt': '-43.6026611328125', 
    'lat': '51.920243117958307', 
    'lon': '26.706140967085958', 
    'time': '2012-01-17T11:45:21Z'}] 
+0

? – beoliver

+0

@ user969617如果你有文件,你可以直接使用etree.parse http://docs.python.org/library/xml.etree.elementtree.html#xml.etree.ElementTree.parse –