XML使用Python和minidom命名

我使用Python（minidom命名）來分析，打印的分層結構看起來是這樣的（縮進這裏用來顯示顯著層次關係）的XML文件解析：XML使用Python和minidom命名

My Document 
Overview 
    Basic Features 
    About This Software 
     Platforms Supported

相反，程序在節點上迭代多次，產生以下內容，打印重複節點。（看着在每次迭代的節點列表，這是顯而易見的，爲什麼它這樣做，但我似乎無法找到一個辦法讓我要找的節點列表。）

My Document 
Overview 
Basic Features 
About This Software 
Platforms Supported 
Basic Features 
About This Software 
Platforms Supported 
Platforms Supported

這裏是XML源代碼文件：

<?xml version="1.0" encoding="UTF-8"?> 
<DOCMAP> 
    <Topic Target="ALL"> 
     <Title>My Document</Title> 
    </Topic> 
    <Topic Target="ALL"> 
     <Title>Overview</Title> 
     <Topic Target="ALL"> 
      <Title>Basic Features</Title> 
     </Topic> 
     <Topic Target="ALL"> 
      <Title>About This Software</Title> 
      <Topic Target="ALL"> 
       <Title>Platforms Supported</Title> 
      </Topic> 
     </Topic> 
    </Topic> 
</DOCMAP>

這裏是Python程序：

import xml.dom.minidom 
from xml.dom.minidom import Node 

dom = xml.dom.minidom.parse("test.xml") 
Topic=dom.getElementsByTagName('Topic') 
i = 0 
for node in Topic: 
    alist=node.getElementsByTagName('Title') 
    for a in alist: 
     Title= a.firstChild.data 
     print Title

我能不能嵌套「主題」元素，通過改變低層主題名稱爲類似「SubTopic1」和「解決問題SubTopic2' 。但是，我想利用內置的XML分層結構而不需要不同的元素名稱;似乎我應該能夠嵌套'主題'元素，並且應該有一些方法來知道我目前正在查看的'主題'級別。

我已經嘗試了許多不同的XPath函數，但沒有取得太大的成功。

來源

2009-10-20 hWorks

如果你想在第一個的輸出，您可以只打印文本每個元素的 - 我並不清楚structuting如何影響通緝輸出 – Mark

getElementsByTagName是遞歸的，你會得到全部具有匹配tagName的後代。由於您的主題包含也有標題的其他主題，因此該調用將多次獲得較低的標題。

如果你要問只有所有匹配的直接孩子，你沒有可用的XPath，你可以寫一個簡單的過濾器，例如：

def getChildrenByTagName(node, tagName): 
    for child in node.childNodes: 
     if child.nodeType==child.ELEMENT_NODE and (tagName=='*' or child.tagName==tagName): 
      yield child 

for topic in document.getElementsByTagName('Topic'): 
    title= list(getChildrenByTagName('Title'))[0]   # or just get(...).next() 
    print title.firstChild.data

來源

2009-10-20 22:17:38 bobince

感謝您的嘗試。它沒有工作，但它給了我一些想法。以下作品（相同總體思路; FWIW，該節點類型是ELEMENT_NODE）：從xml.dom.minidom進口節點 DOM 進口xml.dom.minidom = xml.dom.minidom.parse（「docmap.xml 「） DEF getChildrenByTitle（節點）：兒童在node.childNodes：如果child.localName == '名稱'：收率子主題= dom.getElementsByTagName（ '主題'）節點在主題： alist = getChildrenByTitle（node） for a alist：＃Title = a.firstChild.data Title = a.childNodes [0] .nodeValue print標題 – hWorks

哎呀是的，我的意思是ELEMENT當然不是TEXT！ doh，fixed – bobince

讓我把這一評論在這裏.. 。

謝謝你的嘗試。它沒有工作，但它給了我一些想法。以下作品（相同總體思路; FWIW，該節點類型是ELEMENT_NODE）：

import xml.dom.minidom 
from xml.dom.minidom import Node 

dom = xml.dom.minidom.parse("docmap.xml") 

def getChildrenByTitle(node): 
    for child in node.childNodes: 
     if child.localName=='Title': 
      yield child 

Topic=dom.getElementsByTagName('Topic') 
for node in Topic: 
    alist=getChildrenByTitle(node) 
    for a in alist: 
#  Title= a.firstChild.data 
     Title= a.childNodes[0].nodeValue 
     print Title

來源

2009-10-21 00:04:10 hWorks

我會調用函數getTitle（或'get_title'），並且不要返回所有直接的子標題元素，而只是第一個（因爲每個子項只能有一個標題）。 –

也許這是我沒有得到。我想要所有直系孩子的頭銜。也許更好的名字是getTitlesOfChildren。 – hWorks

您可以使用以下發生器在列表中運行，並與縮進水平得到標題：

def f(elem, level=-1): 
    if elem.nodeName == "Title": 
     yield elem.childNodes[0].nodeValue, level 
    elif elem.nodeType == elem.ELEMENT_NODE: 
     for child in elem.childNodes: 
      for e, l in f(child, level + 1): 
       yield e, l

如果

import xml.dom.minidom as minidom 
doc = minidom.parse("test.xml") 
list(f(doc))

，您將獲得與下列元組的列表：

0123你與你的文件進行測試

這當然只是一個基本的想法。如果你只是想在開始的時候使用空格，你可以直接在生成器中編寫代碼，儘管這個級別你有更大的靈活性。你也可以自動檢測到第一個級別（這裏只是初始化級別爲-1的糟糕工作）。

來源

2009-10-21 18:45:23 RedGlyph

恰恰是我在發電機前一直試圖做的一切。非常感謝。 – hWorks

Recusive功能：

import xml.dom.minidom 

def traverseTree(document, depth=0): 
    tag = document.tagName 
    for child in document.childNodes: 
    if child.nodeType == child.TEXT_NODE: 
     if document.tagName == 'Title': 
     print depth*' ', child.data 
    if child.nodeType == xml.dom.Node.ELEMENT_NODE: 
     traverseTree(child, depth+1) 

filename = 'sample.xml' 
dom = xml.dom.minidom.parse(filename) 
traverseTree(dom.documentElement)

你的XML：

<?xml version="1.0" encoding="UTF-8"?> 
<DOCMAP> 
    <Topic Target="ALL"> 
     <Title>My Document</Title> 
    </Topic> 
    <Topic Target="ALL"> 
     <Title>Overview</Title> 
     <Topic Target="ALL"> 
      <Title>Basic Features</Title> 
     </Topic> 
     <Topic Target="ALL"> 
      <Title>About This Software</Title> 
      <Topic Target="ALL"> 
       <Title>Platforms Supported</Title> 
      </Topic> 
     </Topic> 
    </Topic> 
</DOCMAP>

所需輸出：

$ python parse_sample.py 
     My Document 
     Overview 
      Basic Features 
      About This Software 
       Platforms Supported

來源

2013-01-10 10:08:00 imesias

我認爲這可以幫助

import os 
import sys 
import subprocess 
import base64,xml.dom.minidom 
from xml.dom.minidom import Node 
f = open("file.xml",'r') 
data = f.read() 
i = 0 
doc = xml.dom.minidom.parseString(data) 
for topic in doc.getElementsByTagName('Topic'): 
    title= doc.getElementsByTagName('Title')[i].firstChild.nodeValue 
    print title 
    i +=1

輸出：

My Document 
Overview 
Basic Features 
About This Software 
Platforms Supported

來源

2014-01-28 16:07:43 aabdulwahed

XML使用Python和minidom命名

回答

相關問題