2017-08-02 50 views
1

我對Python很新,我搜索了很多,但找不到解決方案。我想將下面的xml文件解析爲csv文件。我該如何解析嵌套的XML(與孩子相同的名稱)CSV?

<List> 
    <item> 
    <id>5939c5e20d82880efce93933</id> 
    <sensorEvents> 
     <sensorEvents> 
      <avgSped>48.55647532226298</avgSped> 
      <completed>true</completed> 
     </sensorEvents> 
     <sensorEvents> 
      <avgSped>39.53368357145088</avgSped> 
      <completed>true</completed> 
     </sensorEvents> 
     <sensorEvents> 
      <avgSped>41.41160105233052</avgSped> 
      <completed>true</completed> 
     </sensorEvents> 
    </sensorEvents> 
    </item> 

    . 
    . 
    . 
    . 

</List> 

,我寫的代碼是這樣的:

import xml.etree.ElementTree as ET 
import csv 
tree = ET.parse("my_xml_file.xml") 
root = tree.getroot() 
f = open('my_csv_file.csv', 'w') 
csvwriter = csv.writer(f) 

head = ['ID','avgSped','completed'] 
csvwriter.writerow(head) 

for Item in root.findall('item'): 

    for Sensorevents in Item.findall('sensorEvents'): 


     row = [] 
     id_ = Item.find('id').text 
     row.append(id_) 

     avgSped_ = Sensorevents.find('sensorEvents').find('avgSped').text 
     row.append(avgSped_) 

     completed_ = Sensorevents.find('sensorEvents').find('completed').text 
     row.append(completed_) 

     csvwriter.writerow(row) 


f.close() 

,結果是這樣的:

enter image description here

有3個sensorEvents但我的代碼只是捕捉到的第一個。我如何修改代碼來讀取所有sensorEvents? 任何幫助真的很感激。

回答

2

既然你有一個包含3 <sensorEvents>,第一<sensorEvents>陰影孩子<sensorEvents><sensorEvents>一個<sensorEvents>標籤。

這意味着

for Sensorevents in Item.findall('sensorEvents'): 

將循環每

<sensorEvents> 
    <sensorEvents> 
     <avgSped>48.55647532226298</avgSped> 
     <completed>true</completed> 
    </sensorEvents> 
    <sensorEvents> 
     <avgSped>39.53368357145088</avgSped> 
     <completed>true</completed> 
    </sensorEvents> 
    <sensorEvents> 
     <avgSped>41.41160105233052</avgSped> 
     <completed>true</completed> 
    </sensorEvents> 
</sensorEvents> 

然後

avgSped_ = Sensorevents.find('sensorEvents').find('avgSped').text 
    row.append(avgSped_) 

    completed_ = Sensorevents.find('sensorEvents').find('completed').text 

只有一次獲取數據第一個代碼只

你應該嘗試

for Item in root.findall('item'): 
    for root_Sensorevents in Item.findall('sensorEvents'): 
     for Sensorevents in root_Sensorevents.findall('sensorEvents'): 
... 
+0

完全適合我。謝謝。 – Saeed

0

你也可以考慮使用lxml的圖書館,因爲有了它,你可以通過XPath表達式經常做出簡單的代碼搜索。

這裏,XPath表達式.//sensorEvents/sensorEvents說找任何地方sensorEvents元素在文檔中,然後查找sensorEvents元素立即下這些。

一旦你有了這些,爲元素的屬性編寫表達式通常是一件簡單的事情,如圖所示。

>>> from lxml import etree 
>>> tree = etree.parse('temp2.xml') 
>>> inner_sensorEvents = tree.xpath('.//sensorEvents/sensorEvents') 
>>> for inner_sensorEvent in inner_sensorEvents: 
...  inner_sensorEvent.find('avgSped').text, inner_sensorEvent.find('completed').text 
... 
('48.55647532226298', 'true') 
('39.53368357145088', 'true') 
('41.41160105233052', 'true')