9
我想寫使用ElementTree的這樣的UTF-8編碼數據的XML文件UTF-8的數據:編寫XML UTF-8的文件與ElementTree的
#!/usr/bin/python
# -*- coding: utf-8 -*-
import xml.etree.ElementTree as ET
import codecs
testtag = ET.Element('unicodetag')
testtag.text = u'Töreboda' #The o is really ö (o with two dots over). No idea why SO dont display this
expfile = codecs.open('testunicode.xml',"w","utf-8-sig")
ET.ElementTree(testtag).write(expfile,encoding="UTF-8",xml_declaration=True)
expfile.close()
這打擊了錯誤
Traceback (most recent call last):
File "unicodetest.py", line 10, in <module>
ET.ElementTree(testtag).write(expfile,encoding="UTF-8",xml_declaration=True)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 815, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 932, in _serialize_xml
write(_escape_cdata(text, encoding))
File "/usr/lib/python2.7/codecs.py", line 691, in write
return self.writer.write(data)
File "/usr/lib/python2.7/codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
使用「us-ascii」編碼可以正常工作,但不保留數據中的unicode字符。發生什麼事?
+1。只是爲了澄清這一點:問題在於你試圖對unicode-> utf-8進行兩次編碼:ElementTree執行一次,然後編解碼器啓用的流嘗試再次執行它。但是由於第二次輸入已經被編碼,所以第二次輸入會變得困惑(它需要一個unicode字符串,而不是獲取utf-8編碼的字節字符串)。 – 2012-04-06 20:13:06
在這裏,我一直在想我是通過提供一個unicode文件來幫助我...我只能說我喜歡stackoverflow? 3小時內完美答案!標記闡述也在解釋很多。 – c0m4 2012-04-06 20:57:05
我一直在處理utf-8數據,並在嘗試寫入xml文件時收到了ElementTree._serialize_text()或_serialize_xml()中的類似錯誤。在將它們添加到我的ET.Element對象之前,我可以通過使用myString.decode('utf-8')將字符串轉換爲unicode來解決此問題。看來ET.ElementTree.write()對其他字符串編碼不滿意。 – drevicko 2012-07-17 14:49:21