編碼/解碼unicode和UTF-8：Python的

我有一個HTML文本：If I'm reading lots of articles編碼/解碼unicode和UTF-8：Python的

我試圖取代'等這樣的特殊字符轉換成Unicode '。我做

rawtxt.encode('utf-8').encode('ascii','ignore')

，但它無法

Error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2

2013-05-16 Harshit

看起來這不是真的產生錯誤的代碼，因爲錯誤來自嘗試解碼字符串ascii。 rawtxt從哪裏來？ – Sarien

@Sarien：這是產生錯誤的代碼。您可以在調用'encode'時獲得解碼錯誤。請參閱：http://chat.stackoverflow.com/rooms/10/conversation/python2-decode-error-when-encoding –

您有任何關於HTML實體，而不是Unicode或UTF-8的問題。試試這個：

import HTMLParser 
h = HTMLParser.HTMLParser() 
s = h.unescape('If I&#039;m reading lots of articles') 
print s

這打印If I'm reading lots of articles。

2013-05-16 11:54:56 likeitlikeit

感謝您節省時間的負載 – Harshit

@ user595169非常歡迎您:) – likeitlikeit

回答