2017-01-24 56 views
1

如果我有以下XML:蟒蛇:獲取子元素的元素使用for循環的ElementTree定義

<uima.cas.FSArray _id="7429" size="2"> 
<i>7409</i> 
<i>7419</i> 
</uima.cas.FSArray> 
<org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="7342" codingScheme="SNOMEDCT" code="269435009" oid="269435009#SNOMEDCT" score="0.0" disambiguated="false" cui="C0879626" tui="T046" preferredText="Adverse effects"/> 
<org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="7322" codingScheme="SNOMEDCT" code="157754004" oid="157754004#SNOMEDCT" score="0.0" disambiguated="false" cui="C0879626" tui="T046" preferredText="Adverse effects"/> 
<org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="7352" codingScheme="SNOMEDCT" code="269432007" oid="269432007#SNOMEDCT" score="0.0" disambiguated="false" cui="C0879626" tui="T046" preferredText="Adverse effects"/> 
<org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="7332" codingScheme="SNOMEDCT" code="213029005" oid="213029005#SNOMEDCT" score="0.0" disambiguated="false" cui="C0879626" tui="T046" preferredText="Adverse effects"/> 
<org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="7362" codingScheme="SNOMEDCT" code="157762007" oid="157762007#SNOMEDCT" score="0.0" disambiguated="false" cui="C0879626" tui="T046" preferredText="Adverse effects"/> 
<uima.cas.FSArray _id="7372" size="5"> 
<i>7362</i> 
<i>7332</i> 
<i>7352</i> 
<i>7322</i> 
<i>7342</i> 
</uima.cas.FSArray> 
<org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="7235" codingScheme="SNOMEDCT" code="274241003" oid="274241003#SNOMEDCT" score="0.0" disambiguated="false" cui="C0004134" tui="T184" preferredText="Ataxia"/> 
<org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="7265" codingScheme="SNOMEDCT" code="39384006" oid="39384006#SNOMEDCT" score="0.0" disambiguated="false" cui="C0004134" tui="T184" preferredText="Ataxia"/> 
<org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="7255" codingScheme="SNOMEDCT" code="206825002" oid="206825002#SNOMEDCT" score="0.0" disambiguated="false" cui="C0004134" tui="T184" preferredText="Ataxia"/> 
<org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="7275" codingScheme="SNOMEDCT" code="20262006" oid="20262006#SNOMEDCT" score="0.0" disambiguated="false" cui="C0004134" tui="T184" preferredText="Ataxia"/> 
<org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="7245" codingScheme="SNOMEDCT" code="158202006" oid="158202006#SNOMEDCT" score="0.0" disambiguated="false" cui="C0004134" tui="T184" preferredText="Ataxia"/> 
<uima.cas.FSArray _id="7285" size="5"> 
<i>7245</i> 
<i>7275</i> 
<i>7255</i> 
<i>7265</i> 
<i>7235</i> 

我想從節點uima.cas.FSArray

即看得到_id<i>子元素第一個節點(前三行)我想檢索類似於

_id i 
7429 7409 
7429 7419 

以及類似的以下uima.cas.FSArray節點。

我意識到同一個節點(沒有屬性出現),所以我只對_id元素的節點感興趣。

這裏是我的嘗試:

#!/usr/bin/env python 

import sys 
import os 
import xml.etree.cElementTree as ET 

tree = ET.ElementTree(file=sys.argv[-1]) 

UMLSarr = {} 
for x in tree.iterfind('uima.cas.FSArray'): 
    UMLSarr[x] = x.attrib 
    subArr[x] = SubElement(UMLSarr[x],"subArr",attrib='i') 

,但我得到:

Traceback (most recent call last): 
    File "<stdin>", line 3, in <module> 
NameError: name 'SubElement' is not defined 

我試過這個代碼的各種其他迭代,但我遇到越來越多的錯誤,並希望有人可以幫我一把。

謝謝。

回答

1
from lxml import etree 

et = etree.fromstring(xml) 
for array in et.xpath('//uima.cas.FSArray[@_id]'): 
    print(array.xpath('@_id'), array.xpath('./i/text()')) 

出來:

['7429'] ['7409', '7419'] 
['7372'] ['7362', '7332', '7352', '7322', '7342'] 
['7285'] ['7245', '7275', '7255', '7265', '7235'] 
+1

@(抱歉,我不能拼出您的用戶名) - 此爲我工作。非常感謝。 – brucezepplin