在python中解析XML真的很醜陋嗎？

我有一個非常小的XML文件（22行），包含5個元素（？），我只想要一個值。在python中解析XML真的很醜陋嗎？

這是唯一的出路，我可以得到價值我以前沒有使用正則表達式

from xml.dom.minidom import parse 
float(parse(filePath).getElementsByTagName('InfoType')[0].getElementsByTagName('SpecificInfo')[0].firstChild.data)

我覺得我失去了一些東西已經找到。必須有一種更加pythonic的方式來處理XML，對吧？

來源

2013-08-19 horriblyUnpythonic

我建議使用Google搜索XPath。 – FatalError

只是在旁邊注意：您不能用正則表達式來解析XML（或HTML或大多數標記語言）。後者是類型3（常規），前者不是。 http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Hyperboreus

ElementTree庫比python更多xml.dom.minidom。如果我理解你的XML結構的權利，你的代碼看起來像這樣使用ElementTree的：

import xml.etree.ElementTree as ET 
tree = ET.parse(filePath) 
data = float(tree.find('InfoType/SpecificInfo')[0].text)

這應該是比你現在在做什麼更清潔了很多。

來源

2013-08-19 03:52:05 rmunn

的相反，你至少可以使用pyQuery那些長期DOM瀏覽功能：http://pythonhosted.org/pyquery/（在Python jQuery的語法）

來源

2013-08-19 03:35:04

使用ElementTree的是從XML獲得個人價值更Python的方式：

http://docs.python.org/2/library/xml.etree.elementtree.html

它是最近Python版本的標準庫的一部分。

來源

2013-08-19 03:50:36

我認爲現在解僱minidom API是不合時宜的。有一些輔助函數，我們可以像pythonic那樣，我們希望，例如：

# Helper function to wrap the DOM element/attribute creation API. 
def El(tag, attribs = None, text = None): 
    el = doc.createElement(tag) 
    if text: el.appendChild(doc.createTextNode(text)) 
    if attribs is None: return el 
    for k, v in attribs.iteritems(): el.setAttribute(k, v) 
    return el 

# Construct an element tree from the passed tree. 
def make_els(parent_el, this_el, child_els): 
    parent_el.appendChild(this_el) 
    for x in child_els: 
     if type(x) is tuple: 
      child_el, grandchild_els = x 
      make_els(this_el, child_el, grandchild_els) 
     else: 
      this_el.appendChild(x) 

doc.removeChild(doc.documentElement) 
make_els(doc, El('html', { 'xmlns': 'http://www.w3.org/1999/xhtml', 'dir': 'ltr', 'lang': 'en' }), [ 
    ( El('head'), [ 
     El('meta', { 'http-equiv': 'Content-Type', 'content': 'text/html; charset=utf-8' }), 
     El('meta', { 'http-equiv': 'Content-Style-Type', 'content': 'text/css' }), 
     El('link', { 'rel': 'stylesheet', 'type': 'text/css', 'href': 'main.css', 'title': 'Default Stylesheet' }), 
     El('title', {}, 'XXXX XXXX XXXXr {}, {}'.format(args.xxxx, env.build_time)) 
    ]), 
    ( El('frameset', { 'cols': '20%, 80%' }), [ 
     El('frame', { 'src': 'xxx_list.html', 'name': 'listframe', 'title': 'XXXX XXXX XXXX' }), 
     El('frame', { 'src': 'xxx_all_xxxx_all.html', 'name': 'regframe', 'title': 'XXX XXXX XXXX' }), 
     ( El('noframes'), [ 
      ( El('body'), [ 
       El('h2', {}, 'Frame Alert'), 
       El('p', {}, 'This document is designed to be viewed using the frames feature.') 
      ]) 
     ]) 
    ]) 
]) 
print '\ndoc:\n', doc.toprettyxml(indent = ' ')

來源

2014-09-05 10:31:12 PatB

在python中解析XML真的很醜陋嗎？

回答

相關問題