2012-06-16 31 views
0

我有這樣的XML格式.....如何使用Python 3.2

<event timestamp="0.447463" bustype="LIN" channel="LIN 1"> 
<col name="Time"/> 
<col name="Start of Frame">0.440708</col> 
<col name="Channel">LIN 1</col> 
<col name="Dir">Tx</col> 
<col name="Event Type">LIN Frame (Diagnostic Request)</col> 
<col name="Frame Name">MasterReq_DB</col> 
<col name="Id">3C</col> 
<col name="Data">81 06 04 04 FF FF 50 4C</col> 
<col name="Publisher">TestMaster (simulated)</col> 
<col name="Checksum">D3 &quot;Classic&quot;</col> 
<col name="Header Duration">2.090 ms (40.1 bits)</col> 
<col name="Resp. Duration">4.688 ms (90.0 bits)</col> 
<col name="Time difference">0.049987</col> 
<empty/> 
</event> 

在上面的XML中提取與XML文件的屬性相關聯的數據,我需要提取具有屬性「名稱」
相關數據我能夠得到所有的名字,但我無法提取> MasterReq_DB <場
請幫我...提前
感謝

我的Python代碼...

import sys 
import array 
import string 
from xml.dom.minidom import parse,parseString 
from xml.dom import minidom            
input_file = open("test_input.txt",'r')             
alines = input_file.read() 
word_lst = alines.split("'") 
filename = word_lst[1] 
pathname=word_lst[3]            
f = open(pathname,'r') 
doc = minidom.parse(f) 
node = doc.documentElement 
events = doc.getElementsByTagName('event') 
for event in events: 
    #print (event) 
    columns = event.getElementsByTagName('col') 
    for column in columns: 
     #print (column) 
     head = column.getAttribute('name') 
     if (head == ('Frame Name')): 
      print (head) 
      request = head.firstChild.wholeText 
      print (request) 
print ("DOne") 
+0

你試過了什麼代碼?你看過[elementtree](http://docs.python.org/py3k/library/xml.etree.elementtree.html)和[lxml](http://lxml.de/)(後者是更多強大的擴展功能與前者重疊)。 –

+0

請看我上面的python代碼... – Rohit

+0

而'print(request)'輸出什麼?你有沒有試過'打印(repr(request))'?我強烈建議切換到'elementtree'作爲Python的一個非常優越的XML API。 –

回答

1

這裏有一個底,讓你開始與lxml,如果你想:

In [1]: x = '''<event timestamp="0.447463" bustype="LIN" channel="LIN 1"> 
    ...: <col name="Time"/> 
    ...: <col name="Start of Frame">0.440708</col> 
    ...: <col name="Channel">LIN 1</col> 
    ...: <col name="Dir">Tx</col> 
    ...: <col name="Event Type">LIN Frame (Diagnostic Request)</col> 
    ...: <col name="Frame Name">MasterReq_DB</col> 
    ...: <col name="Id">3C</col> 
    ...: <col name="Data">81 06 04 04 FF FF 50 4C</col> 
    ...: <col name="Publisher">TestMaster (simulated)</col> 
    ...: <col name="Checksum">D3 &quot;Classic&quot;</col> 
    ...: <col name="Header Duration">2.090 ms (40.1 bits)</col> 
    ...: <col name="Resp. Duration">4.688 ms (90.0 bits)</col> 
    ...: <col name="Time difference">0.049987</col> 
    ...: <empty/> 
    ...: </event> ''' 

In [2]: from lxml import etree 

In [3]: tree = etree.fromstring(x) 

In [4]: [elem.text for elem in tree.xpath('//*[@name]')] 
Out[4]: 
[None, 
'0.440708', 
'LIN 1', 
'Tx', 
'LIN Frame (Diagnostic Request)', 
'MasterReq_DB', 
'3C', 
'81 06 04 04 FF FF 50 4C', 
'TestMaster (simulated)', 
'D3 "Classic"', 
'2.090 ms (40.1 bits)', 
'4.688 ms (90.0 bits)', 
'0.049987'] 

In [5]: [name for name in tree.xpath('//@name')] 
Out[5]: 
['Time', 
'Start of Frame', 
'Channel', 
'Dir', 
'Event Type', 
'Frame Name', 
'Id', 
'Data', 
'Publisher', 
'Checksum', 
'Header Duration', 
'Resp. Duration', 
'Time difference'] 

從文件而不是字符串讀取,使用lxml.etree.parse功能。

這是鏈接到lxml tutorial。這是XPath syntax的參考。

+0

嘿,你有什麼建議我?使用lxml或DOM bcos這只是我的工作的開始,我需要解析xml文件,這些文件是以MB爲單位的... – Rohit

+0

說實話,我沒有任何關於DOM的經驗。 'lxml'非常適合解析。爲了解析幾個Gb大小的文件,我使用'lxml'的['iterparse'](http://lxml.de/parsing.html#iterparse-and-iterwalk)方法,效果很好。對於較小的文件,我的答案中的例子就是我通常所做的。 –

+0

謝謝suggetion ...我如何寫輸出到excel2007 ...? – Rohit