你在那裏有一個HTML轉義。使用HTMLParser.HTMLParser()
class反轉義這些:
from HTMLParser import HTMLParser
parser = HTMLParser()
unescaped = parser.unescape(escaped)
演示:
>>> from HTMLParser import HTMLParser
>>> parser = HTMLParser()
>>> escaped = '‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.'
>>> parser.unescape(escaped)
u'\u2018The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,\u2019wroteforumuser Ensorceled.'
>>> print parser.unescape(escaped)
‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.
在Python 3,HTMLParser
模塊已更名爲html.parser
;相應地調整進口:
from html.parser import HTMLParser
‘是一個數字實體。 –
是[此](http://stackoverflow.com/questions/730299/replace-html-entities-with-the-corresponding-utf-8-characters-in-python-2-6)? – rlms
沒有在08年被問過。請確認。 – user2784753