在python中寫入XML文件損壞文件

我正在嘗試將xml.dom.minidom對象中的內容寫入文件。簡單的想法是使用'writexml'方法：在python中寫入XML文件損壞文件

import codecs 

def write_xml_native(): 
    # Building DOM from XML 
    xmldoc = minidom.parse('semio2.xml') 
    f = codecs.open('codified.xml', mode='w', encoding='utf-8') 
    # Using native writexml() method to write 
    xmldoc.writexml(f, encoding="utf=8") 
    f.close()

問題是它破壞了文件中的非拉丁文編碼文本。另一種方式是讓文本字符串，並將其寫入到文件中明確：

def write_xml(): 
    # Building DOM from XML 
    xmldoc = minidom.parse('semio2.xml') 
    # Opening file for writing UTF-8, which is XML's default encoding 
    f = codecs.open('codified3.xml', mode='w', encoding='utf-8') 
    # Writing XML in UTF-8 encoding, as recommended in the documentation 
    f.write(xmldoc.toxml("utf-8")) 
    f.close()

這將導致以下錯誤：

Traceback (most recent call last): 
    File "D:\Projects\Semio\semioparser.py", line 45, in <module> 
    write_xml() 
    File "D:\Projects\Semio\semioparser.py", line 42, in write_xml 
    f.write(xmldoc.toxml(encoding="utf-8")) 
    File "C:\Python26\lib\codecs.py", line 686, in write 
    return self.writer.write(data) 
    File "C:\Python26\lib\codecs.py", line 351, in write 
    data, consumed = self.encode(object, self.errors) 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 2064: ordinal not in range(128)

如何編寫一個XML文本文件？我錯過了什麼？

編輯。錯誤通過添加解碼語句來解決： f.write(xmldoc.toxml("utf-8").decode("utf-8")) 但是，俄羅斯符號仍然損壞。

在解釋器中查看文本時，文本沒有被破壞，但是在文件中寫入文本時。

來源

2010-12-19 martinthenext

只是一個想法：你確定您沒有錯誤地查看文件？也許讀者期待的是另一種編碼，而不是utf-8，它看起來像是borked。 – Nubsis 2010-12-29 12:28:11

@Nubsis這正是發生了什麼事情。觀衆一直期待着ASCII編碼。我會保持線程，因爲使用.decode（）也是問題。謝謝！ – martinthenext 2011-01-09 19:08:08

嗯，雖然這應該工作：

xml = minidom.parse("test.xml") 
with codecs.open("out.xml", "w", "utf-8") as out: 
    xml.writexml(out)

你可以或者嘗試：

with codecs.open("test.xml", "r", "utf-8") as inp: 
    xml = minidom.parseString(inp.read().encode("utf-8")) 
with codecs.open("out.xml", "w", "utf-8") as out: 
    xml.writexml(out)

更新：如果您構建XML列的字符串對象，你應該傳遞之前對其進行編碼到minidom解析器，像這樣：

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import codecs 
import xml.dom.minidom as minidom 

xml = minidom.parseString(u"<ru>Тест</ru>".encode("utf-8")) 
with codecs.open("out.xml", "w", "utf-8") as out: 
    xml.writexml(out)

來源

2010-12-19 18:09:13

感謝您的回答。我測試了你的所有代碼，對我來說沒有任何問題。即使是最後一部分，與打開XML文件無關，也會將俄語字符串翻譯爲廢話。這意味着問題在於將urf-8寫入文件。還有什麼想法？ – martinthenext 2010-12-19 19:15:34

@martinthenext：我幾乎肯定你會得到有效的「utf-8」（3個例子都適用於我，無論是在windows＆linux還是python 2.5,2.6和2.7上）或者你的python安裝被破壞;這裏去截圖：http：//img190.imageshack.us/img190/9072/minidom.png – 2010-12-19 20:03:19

等等，解釋器本身的輸出就好，沒有問題。寫入文件時會損壞。我怎樣才能解決這個問題？ – martinthenext 2010-12-19 20:09:31

試試這個：

with open("codified.xml", "w") as f: 
    f.write(xmldoc.toxml("utf-8").decode("utf-8"))

這對我的作品（Python 3的下，雖然）。

來源

2010-12-19 17:48:17

nope，它仍然破壞非拉丁字符 – martinthenext 2010-12-19 17:56:15

如果你'x = codecs.open（「semio2.xml」，encoding =「utf-8」）'''xmldoc = minidom.parse（x）'會發生什麼？ – 2010-12-19 18:03:07

它說'UnicodeEncodeError：'ascii'編解碼器不能編碼字符u'\ ufeff'在位置0：序號不在範圍（128）'中。我不明白爲什麼。 – martinthenext 2010-12-19 19:22:50

在python中寫入XML文件損壞文件

回答

相關問題