2017-01-23 141 views
0

我想提取XML文件中特定標記的內容。elementtree:獲取XML文檔中特定標記的內容

示例XML:

<facts> 
     <fact> 
      <name>crash</name> 
      <full_name>Crash</full_name> 
      <variables> 
       <variable> 
        <name>id</name> 
        <proper_name>Crash Instance</proper_name> 
        <type>INT</type> 
        <interpretation>key</interpretation> 
       </variable> 
       <variable> 
        <name>accident_key</name> 
        <proper_name>Case Identifier</proper_name> 
        <interpretation>string</interpretation> 
        <type>CHAR(9)</type> 
       </variable> 
       <variable> 
        <name>accident_year</name> 
        <proper_name>Crash Year</proper_name> 
        <interpretation>dim</interpretation> 
        <type>INT</type> 
       </variable> 
      </variables> 
     </fact> 
    <fact> 
     <name>vehicle</name> 
     <full_name>Vehicle</full_name> 
     <variables> 
      <variable> 
       <name>id</name> 
       <proper_name>Vehicle Instance</proper_name> 
       <type>INT</type> 
      </variable> 
      <variable> 
       <name>crash_id</name> 
        <proper_name>Crash Instance</proper_name> 
       <type>INT</type> 
      </variable> 
     </variables> 
    </fact> 
</facts> 

我想拉所有的從節點標籤的內容,但只有在崩潰的事實。

這是我的代碼到目前爲止。

def header(filename, fact):  
    lst = [] 
    tree = ET.parse(filename) #read in the XML 
    for fact in tree.iter(tag = 'fact'): 
     factname = fact.find('name').text 
     if factname == fact: #choose the fact to pull from 
      for var in fact.iter(tag = 'variable'): 
       name = var.find('name').text 
       lst.append(name) 
    return lst #return a list of all the <name> tags from the Crash fact 

newlst = header('schema.xml','crash') 

我的輸出newlst應該是Crash事實中所有標記的列表。但它一直空着。

奇怪的是,它返回正確的輸出,如果我硬編碼的一切(和刪除功能):

lst = [] 
tree = ET.parse('schema.xml') 
for fact in tree.iter(tag = 'fact'): 
    factname = fact.find('name').text 
    if factname == 'crash': 
     for var in fact.iter(tag = 'variable'): 
      name = var.find('name').text 
      lst.append(name) 
print(lst) 


Output: ['id', 
'accident_key', 
'accident_year'] 

回答

3

在功能,您使用的變量fact既作爲參數,並作爲第一for循環的變量。試試這個版本:

def header(filename, target_factname):  
    lst = [] 
    tree = ET.parse(filename) #read in the XML 
    for fact in tree.iter(tag = 'fact'): 
     factname = fact.find('name').text 
     if factname == target_factname: #choose the fact to pull from 
      for var in fact.iter(tag = 'variable'): 
       name = var.find('name').text 
       lst.append(name) 
    return lst #return a list of all the <name> tags from the Crash fact 
+0

我知道我在犯一個愚蠢的錯誤......謝謝! – ale19