我有一個ElementTree
比如我試圖輸出使用tostring
法文本:蟒蛇ElementTree的解碼錯誤
tostring(root, encoding='UTF-8')
我得到了UnicodeDecodeError
(以下回溯),因爲Element.text
節點中的一個具有u'\u2014'
個性。我設置text屬性如下:
my_str = u'\u2014'
el.text = my_str.encode('UTF-8')
我怎樣才能成功地序列化樹發送短信?我編碼的節點不正確?謝謝。
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "crisis_app/converters/to_xml.py", line 129, in convert
return tostring(root, encoding='UTF-8')
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1127, in tostring
ElementTree(element).write(file, encoding, method=method)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 821, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 940, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 940, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 940, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 938, in _serialize_xml
write(_escape_cdata(text, encoding))
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1074, in _escape_cdata
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 288: ordinal not in range(128)
有消息稱它試圖把它當作ASCII,不是UTF-8解碼。而且,0xE2似乎與0x2014(em-dash)沒有關係。 –
我們可以看到更多的代碼嗎?看起來你的樹中有非Unicode文本,它使'text.encode()'首先將**解碼爲Unicode,然後再進行編碼。 –
@JimGarrison是的,它確實相關,這是em-dash的utf-8表示:0xE2 0x80 0x94(e28094)0xE2是第一個字節。 http://www.fileformat.info/info/unicode/char/2014/index.htm –