2017-02-09 76 views
0

我試圖從具有多級標記的xml文件中提取字段。在以下示例中,python多級別標記中的XML解析

<compound kind="struct"> 
    <name>my-struct</name> 
    <filename>struct____dt__args.html</filename> 
    <member kind="variable"> 
     <type>int32_t</type> 
     <name>count</name> 
     <anchorfile>struct____dt__args.html</anchorfile> 
     <anchor>a0fbe49d8b1189286bd817409658eb631</anchor> 
     <arglist></arglist> 
    </member> 
    <member kind="variable"> 
     <type>int32_t</type> 
     <name>create_type</name> 
     <anchorfile>struct____dt__args.html</anchorfile> 
     <anchor>a4e38c7f138891d020cce3c6d7e6bc31e</anchor> 
     <arglist></arglist> 
    </member> 
    <member kind="variable"> 
     <type>size_t</type> 
     <name>total_size</name> 
     <anchorfile>struct____dt__args.html</anchorfile> 
     <anchor>a41ca25bca63ad1fee790134901d8d1c0</anchor> 
     <arglist></arglist> 
    </member> 
    </compound> 

我需要解析此並提取「化合物」標籤字段(有不同種類的結構/功能/類等多種化合物標記,)我需要唯一的一種結構=標籤其次是其子女'成員'標籤的類型和名稱。

struct my-struct: 
int32_t count 
int32_t create_type 
size_t total_size 

回答

0

這裏是解決方案:

from xml.etree import ElementTree 


def extract_structs(xml_path): 
    # data and xml structure validation omitted 
    # result collected as lists and tuples without string formatting 
    struct_list = [] 
    root = ElementTree.parse(xml_path).getroot() 
    for compound in root: 
     kind = compound.get('kind') 
     if kind != 'struct': 
      continue 
     current_struct = [] 
     struct_list.append(current_struct) 
     struct_name = compound.find('./name').text 
     current_struct.append((kind, struct_name)) 
     for member in compound.findall('./member'): 
      member_type = member.find('./type').text 
      member_name = member.find('./name').text 
      current_struct.append((member_type, member_name)) 
    return struct_list 


if __name__ == '__main__': 
    structs = extract_structs('test_file.xml') 
    print(structs) 
+0

這版畫只是對我一個空列表。 – marc

+0

也許你在你的根元素中有一個命名空間,或者化合物不是根的直接子節點。我的代碼基於未顯示的關於您的上下文的假設。請顯示完整的XML結構。 –