2017-01-30 117 views
4

我正在構建openoffice文檔。我有一個腳手架,我用它來生成我的content.xml文件。內容-scaffold.xml文件存儲在文件系統,看起來像這樣:ElementTree:爲什麼我的名稱空間聲明被刪除?

<?xml version="1.0" encoding="UTF-8"?> 
    <office:document-content 
    xmlns:anim="urn:oasis:names:tc:opendocument:xmlns:animation:1.0" 
    xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" 
    xmlns:config="urn:oasis:names:tc:opendocument:xmlns:config:1.0" 
    xmlns:db="urn:oasis:names:tc:opendocument:xmlns:database:1.0" 
    xmlns:dc="http://purl.org/dc/elements/1.1/" 
    xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" 
    xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" 
    xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" 
    xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" 
    xmlns:grddl="http://www.w3.org/2003/g/data-view#" 
    xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0" 
    xmlns:math="http://www.w3.org/1998/Math/MathML" 
    xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" 
    xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" 
    xmlns:odf="http://docs.oasis-open.org/ns/office/1.2/meta/odf#" 
    xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" 
    xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" 
    xmlns:pkg="http://docs.oasis-open.org/ns/office/1.2/meta/pkg#" 
    xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0" 
    xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" 
    xmlns:smil="urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0" 
    xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" 
    xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" 
    xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" 
    xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" 
    xmlns:xforms="http://www.w3.org/2002/xforms" 
    xmlns:xhtml="http://www.w3.org/1999/xhtml" 
    xmlns:xlink="http://www.w3.org/1999/xlink" 
    office:version="1.2"> 

    <office:automatic-styles> 

    <style:style style:family="text" style:name="Strong"> 
     <style:text-properties 
     fo:color="#000000" 
     fo:font-weight="bold" /> 
    </style:style> 

    </office:automatic-styles> 


    <office:body> 
    <office:text> 
     <!-- content will go here --> 
    </office:text> 
    </office:body> 

</office:document-content> 

的想法是,我將這些XML注入的東西帶進辦公室:文本標籤(在python),然後渲染回來。在這個例子中,我正在注入一個簡單的文本:p標籤。

document_content = ElementTree.parse('content-scaffold.xml').getroot() 
office_body = document_content.find('office:body', NAMESPACES) 
office_text = office_body.find('office:text', NAMESPACES) 
p = ElementTree.SubElement(office_text, 'text:p') 
p.text = "Hello" 

然而,這是空間中聲明的樣子,一旦呈現:

<office:document-content 
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" 
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" 
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" 
office:version="1.2"> 

這將導致以下錯誤:

Namespace prefix text on p is not defined

這是很明顯,ElementTree的是僅保留的xmlns需要聲明(在我的例子中,辦公室和樣式,因爲它們是content-scaffold.xml中唯一的聲明),而且它非常整齊。但是,我真的很想要它們,以便能夠使用所有名稱空間。

任何想法如何強制ElementTree讓他們都?或者我從一開始就認爲這個錯誤?我願意接受任何替代解決方案。

注:我使用Python 3和ElementTree

感謝

+0

發現了一個非常類似的問題:http://stackoverflow.com/q/24557151/407651 – mzjn

回答

2

ElementTree的是相當薄弱的,當涉及到名稱空間處理。但是,你問的是可以做到(但它是一個有點麻煩):

from xml.etree import ElementTree as ET 

NAMESPACES = {"anim": "urn:oasis:names:tc:opendocument:xmlns:animation:1.0", 
    "chart": "urn:oasis:names:tc:opendocument:xmlns:chart:1.0", 
    "config": "urn:oasis:names:tc:opendocument:xmlns:config:1.0", 
    "db": "urn:oasis:names:tc:opendocument:xmlns:database:1.0", 
    "dc": "http://purl.org/dc/elements/1.1/", 
    "dr3d": "urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0", 
    "draw": "urn:oasis:names:tc:opendocument:xmlns:drawing:1.0", 
    "fo": "urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0", 
    "form": "urn:oasis:names:tc:opendocument:xmlns:form:1.0", 
    "grddl": "http://www.w3.org/2003/g/data-view#", 
    "manifest": "urn:oasis:names:tc:opendocument:xmlns:manifest:1.0", 
    "math": "http://www.w3.org/1998/Math/MathML", 
    "meta": "urn:oasis:names:tc:opendocument:xmlns:meta:1.0", 
    "number": "urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0", 
    "odf": "http://docs.oasis-open.org/ns/office/1.2/meta/odf#", 
    "of": "urn:oasis:names:tc:opendocument:xmlns:of:1.2", 
    "office": "urn:oasis:names:tc:opendocument:xmlns:office:1.0", 
    "pkg": "http://docs.oasis-open.org/ns/office/1.2/meta/pkg#", 
    "presentation": "urn:oasis:names:tc:opendocument:xmlns:presentation:1.0", 
    "script": "urn:oasis:names:tc:opendocument:xmlns:script:1.0", 
    "smil": "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0", 
    "style": "urn:oasis:names:tc:opendocument:xmlns:style:1.0", 
    "svg": "urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0", 
    "table": "urn:oasis:names:tc:opendocument:xmlns:table:1.0", 
    "text": "urn:oasis:names:tc:opendocument:xmlns:text:1.0", 
    "xforms": "http://www.w3.org/2002/xforms", 
    "xhtml": "http://www.w3.org/1999/xhtml", 
    "xlink": "http://www.w3.org/1999/xlink"} 

document_content = ET.parse('content-scaffold.xml').getroot() 
office_body = document_content.find('office:body', NAMESPACES) 
office_text = office_body.find('office:text', NAMESPACES) 
p = ET.SubElement(office_text, 'text:p') 
p.text = "Hello" 

for prefix, uri in NAMESPACES.items(): 
    ET.register_namespace(prefix, uri)   # Ensure correct prefixes in output 
    if prefix not in ("office", "fo", "style"): # Prevent duplicate ns declarations 
     document_content.set("xmlns:" + prefix, uri) # Add ns declarations to root element 

ET.ElementTree(document_content).write("output.xml") 

此代碼將創建一個保存所有命名空間聲明的結果文件。


下面是如何將其與lxml做到:

from lxml import etree as ET 

NAMESPACES = {"office": "urn:oasis:names:tc:opendocument:xmlns:office:1.0"} 

document_content = ET.parse('content-scaffold.xml') 
office_body = document_content.find('office:body', NAMESPACES) 
office_text = office_body.find('office:text', NAMESPACES) 
p = ET.SubElement(office_text, '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}p') 
p.text = "Hello" 

document_content.write("output.xml") 

請注意,您必須提供SubElement()使用Clark notation元素名稱。

+0

我按照你的建議使用了lxml,它工作正常。謝謝 – MonsieurNinja

相關問題