2016-04-30 30 views
0

我對Python很新,從來沒有用過xml,所以請原諒我缺乏專業知識。Python 3.4:移動XML樹

我有一個xml一個相當長的文本文件,它裏面:

- - - - Some Text Until This Point — — — - - - 


<?xml version="1.0" encoding="UTF-8"?> 
<Record xmlns="http://www.south.org/> 
    <Patient> 
    <ID>4</ID> 
    <Status>Good</Status> 
    <Pain>8</Pain> 
    </Patient> 
    <Hospital> 
    <Name>South Center.</Name> 
    <Address>1234 Main Ave New York NY 4567 United States</Address> 
    <Phone>+1 (123) 456 7890</Phone> 
    <Email>[email protected]</Email> 
    </Hospital> 
    <Insurance> 
    <Name>Health First</Name> 
    <Phone>+1 (123) 456 7890</Phone> 
    </Insurance> 
    <Admitted> 
    <Date>2000-11-8t7:24:02</Date> 
    <Injury>Arm</Injury> 
    <Location>7</Location> 
    </Admitted> 
    <Place> 
    <Room> 
     <Number>28</Number> 
     <Wing>East</Wing> 
    <Name>John Smith</Name> 
    </Room> 
    </Place> 
</Record> 

- - - - - - - - - - - Some more Text - - - - - — - - - - - — - - - - 

我試圖獲得「地方」標籤下的名稱「入院」標籤下的價值觀和價值將它們保存到局部變量。我知道這個問題與下面列出的問題非常相似,但我似乎無法做到。

Python version 2.7: XML ElementTree: How to iterate through certain elements of a child element in order to find a match

這裏是我到目前爲止,它只包含XML代碼而忽略了文本文件的打開和關閉的代碼:

這是錯誤:AttributeError的:「NoneType」對象沒有任何屬性「文本」

import xml.etree.ElementTree as et 

# Slice the xml portion of the text file 
myxml = textfile[textfile.index(<"?xml):(textfile.index("</Record")+8)] 
root = fromstring(myxml) 

for admitted in root: 
    date = admitted.find('Admitted').find('Date').text 
    injury = admitted.find('Admitted').find('Injury').text 
    loc = admitted.find('Admitted').find('Location').text 
    print(date) 
    print(injury) 
    print(loc) 

我將不勝感激任何關於此事的建議,並感謝您提前給予的幫助。

回答

0

我使用minidom解析xml代碼。這很容易。下面的代碼解析出MANUFACTURER部分。見下面的例子。

import xml.dom.minidom 
import re 

xmlstring=""" 
... and listening to slow jazz <---should not be here 
<?xml version="1.0"?> 
<!DOCTYPE PARTS SYSTEM "parts.dtd"> 
<?xml-stylesheet type="text/css" href="xmlpartsstyle.css"?> 
<PARTS> 
    <TITLE>Computer Parts</TITLE> 
    <PART> 
     <ITEM>Motherboard</ITEM> 
     <MANUFACTURER>ASUS</MANUFACTURER> 
     <MODEL>P3B-F</MODEL> 
     <COST> 123.00</COST> 
    </PART> 
foo <---should not be here 
    <PART> 
     <ITEM>Video Card</ITEM> 
     <MANUFACTURER>ATI</MANUFACTURER> 
bar <---should not be here 
     <MODEL>All-in-Wonder Pro</MODEL> 
     <COST> 160.00</COST> 
    </PART> 
</PARTS>""" 

#Clean file to use only xml code otherwise minidom wont work 
l=[] 

for line in xmlstring.split('\n'): 
    newxml=re.search(r'<..*>$',line) 
    if newxml: 
     l.append(line.strip()) 
newxml='\n'.join(l) 
#Minidom 
dom = xml.dom.minidom.parseString(newxml) 
Topic=dom.getElementsByTagName('PARTS') 
i = 0 
for node in Topic: 
    alist=node.getElementsByTagName('MANUFACTURER') 
    for a in alist: 
     Title= a.firstChild.data 
     print Title 

#Output would be ASUS and ATI 
0

考慮etree的dom.findall()。一定要佔未聲明的命名空間顯示在大括號定義:{...}

import lxml.etree as ET 

xmlfile = 'path/to/xml/file.xml' 
dom = ET.parse(xmlfile) 

admitted = dom.findall('{http://www.south.org/}Admitted/*') 

date = []; injury = []; loc = [] 
for i in admitted:  
    if 'Date' in i.tag: date.append(i.text) 
    if 'Injury' in i.tag: injury.append(i.text) 
    if 'Loc' in i.tag: loc.append(i.text) 

print(date) 
print(injury) 
print(loc) 

place = dom.findall('{http://www.south.org/}Place/*/*') 

number = []; wing = []; name = [] 
for i in place:  
    if 'Number' in i.tag: number.append(i.text) 
    if 'Wing' in i.tag: wing.append(i.text) 
    if 'Name' in i.tag: name.append(i.text) 

print(number) 
print(wing) 
print(name) 

輸出

# ['2000-11-8t7:24:02'] 
# ['Arm'] 
# ['7'] 
# ['28'] 
# ['East'] 
# ['John Smith'] 
+0

我一直在尋找使用ElementTree的API,因爲我是新手和XML是嵌入在文本文件中。我唯一的問題似乎是,我無法讓root正確移動到它的子項目 – IronCode

+0

您需要使用[iter()](https://docs.python.org/2/library/xml.etree.elementtree。 html#https://docs.python.org/2/library/xml.etree.elementtree.html#19.7.1.3),但必須在xml中說明未聲明的命名空間。 – Parfait

+0

命名空間!哇,謝謝你解決了這個問題:) – IronCode