2013-09-28 68 views
-2

我的文字是這樣的:轉換ASCII字符普通的文本,

‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled. 

據我所知,#8216是一個ASCII character.How我可以將其轉換爲普通字符,而無需使用.replace這是麻煩的。

+1

‘是一個數字實體。 –

+0

是[此](http://stackoverflow.com/questions/730299/replace-html-entities-with-the-corresponding-utf-8-characters-in-python-2-6)? – rlms

+0

沒有在08年被問過。請確認。 – user2784753

回答

3

你在那裏有一個HTML轉義。使用HTMLParser.HTMLParser() class反轉義這些:

from HTMLParser import HTMLParser 

parser = HTMLParser() 
unescaped = parser.unescape(escaped) 

演示:

>>> from HTMLParser import HTMLParser 
>>> parser = HTMLParser() 
>>> escaped = '‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.' 
>>> parser.unescape(escaped) 
u'\u2018The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,\u2019wroteforumuser Ensorceled.' 
>>> print parser.unescape(escaped) 
‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled. 

在Python 3,HTMLParser模塊已更名爲html.parser;相應地調整進口:

from html.parser import HTMLParser