蟒蛇ElementTree的解碼錯誤

我有一個ElementTree比如我試圖輸出使用tostring法文本：蟒蛇ElementTree的解碼錯誤

tostring(root, encoding='UTF-8')

我得到了UnicodeDecodeError（以下回溯），因爲Element.text節點中的一個具有u'\u2014'個性。我設置text屬性如下：

my_str = u'\u2014' 
el.text = my_str.encode('UTF-8')

我怎樣才能成功地序列化樹發送短信？我編碼的節點不正確？謝謝。

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "crisis_app/converters/to_xml.py", line 129, in convert 
    return tostring(root, encoding='UTF-8') 
    File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1127, in tostring 
    ElementTree(element).write(file, encoding, method=method) 
    File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 821, in write 
    serialize(write, self._root, encoding, qnames, namespaces) 
    File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 940, in _serialize_xml 
    _serialize_xml(write, e, encoding, qnames, None) 
    File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 940, in _serialize_xml 
    _serialize_xml(write, e, encoding, qnames, None) 
    File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 940, in _serialize_xml 
    _serialize_xml(write, e, encoding, qnames, None) 
    File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 938, in _serialize_xml 
    write(_escape_cdata(text, encoding)) 
    File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1074, in _escape_cdata 
    return text.encode(encoding, "xmlcharrefreplace") 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 288: ordinal not in range(128)

來源

2013-07-10 aaronstacy

有消息稱它試圖把它當作ASCII，不是UTF-8解碼。而且，0xE2似乎與0x2014（em-dash）沒有關係。 –

我們可以看到更多的代碼嗎？看起來你的樹中有非Unicode文本，它使'text.encode（）'首先將**解碼爲Unicode，然後再進行編碼。 –

@JimGarrison是的，它確實相關，這是em-dash的utf-8表示：0xE2 0x80 0x94（e28094）0xE2是第一個字節。 http://www.fileformat.info/info/unicode/char/2014/index.htm –

如果你這樣做：

my_str = u'\u2014' 
el.text = my_str.encode('UTF-8')

你設置文本的Unicode字符的UTF-8編碼的版本。它與

el.text = '\xe2\x80\x94'

現在你不再有一個Unicode字符，而是一系列的字節。

如果然後做：

tostring(root, encoding='UTF-8')

你說你想編碼爲UTF-8的內容。爲此，在內部，首先使用默認編碼（ascii）將字符串解碼爲unicode，然後編碼爲utf-8，這當然會失敗，因爲字符串中的字節不在ascii範圍內。

ElementTree的是完全能夠與Unicode的工作，所以只要給它的Unicode而不是海峽的：

>>> from xml.etree import ElementTree as et 
>>> e = et.Element('test') 
>>> e.text = u'\u2014' 

>>> s = et.tostring(e) 
>>> print s, repr(s) 
<test>&#8212;</test> '<test>&#8212;</test>' 

>>> s = et.tostring(e, encoding='utf-8') 
>>> print s, repr(s) 
<test>—</test> '<test>\xe2\x80\x94</test>'

來源

2013-07-10 20:52:35 mata

yea事實證明問題是我在所有情況下都調用'el.text = str（content）'來防止'content'是一個int。這是拋出錯誤，所以我的修復程序有一個邏輯錯誤，最終對輸出進行雙重編碼。 – aaronstacy

蟒蛇ElementTree的解碼錯誤

回答

相關問題