當我在Python中處理HTML代碼時,由於特殊字符,我必須使用以下代碼。使string.replace語句的序列更具可讀性
line = string.replace(line, """, "\"")
line = string.replace(line, "'", "'")
line = string.replace(line, "&", "&")
line = string.replace(line, "<", "<")
line = string.replace(line, ">", ">")
line = string.replace(line, "«", "<<")
line = string.replace(line, "»", ">>")
line = string.replace(line, "'", "'")
line = string.replace(line, "“", "\"")
line = string.replace(line, "”", "\"")
line = string.replace(line, "‘", "\'")
line = string.replace(line, "’", "\'")
line = string.replace(line, "■", "")
line = string.replace(line, "•", "-")
看來會有更多這樣的特殊字符,我必須替換。你知道如何讓這個代碼更優雅嗎?
感謝
可能重複的[在Python字符串解碼HTML實體?](http://stackoverflow.com/questions/2087370/decode-html-entities-in-python-string)在'string' –
'string.replace'和最相似的功能MODU樂已被棄用:http://docs.python.org/library/string.html#deprecated-string-functions –
@Ben詹姆斯謝謝,這個解決方案是適合我的,但它不是一個重複的,因爲我可能要打另一個替換序列(例如, 1000個替代品根據別的東西而不是HTML特殊字符) – xralf