我使用xmlcharrefreplace
來替換字符串中的非標準字符,以便它可以保存在xml文件中。後來,我想重新將此字符串轉換回原始字符。Python字符串編碼xmlcharrefreplace解碼
import openpyxl
import cgi
from html.parser import HTMLParser
parser = HTMLParser()
startingString = "Tỉnh Đồng Nai" #example string
print("Starting string: " + startingString) #Starting string: Tỉnh Đồng Nai
# 1. This string contains non-standard characters. Convert these characters using xmlcharrefreplace
escapedString = cgi.escape(startingString)
strEscapedString = str(escapedString)
aposString = strEscapedString.replace("'", "'")
savedToExcelString = str(aposString.encode('utf-8', 'xmlcharrefreplace'))[2:-1]
print("xmlcharrefreplace converted to: " + savedToExcelString) #xmlcharrefreplace converted to: T\xe1\xbb\x89nh \xc4\x90\xe1\xbb\x93ng Nai
# 2. The string is saved to an xml file
# 3. The string is read from an xml file
# 4. Convert the string back into the original starting string
unescapedString = parser.unescape(savedToExcelString)
#what do I do here??? I need to 'undo' the xmlcharrefreplace encoding
print(startingString + " == " + unescapedString + " is " + str(startingString == unescapedString))
# Tỉnh Đồng Nai == T\xe1\xbb\x89nh \xc4\x90\xe1\xbb\x93ng Nai is False
# ^^ Should be the same string at the end
請注意,我不能使用codecs.open(),因爲我使用的庫openpyxl打開包含數據以及Excel文件。對輸入可能是什麼字符沒有限制 - 我希望最終字符串與初始字符串相同。
目標:將字符從xmlcharrefreplace轉換回其腳本字符。 例如:「\ x90」變成「ồ」
謝謝,這是正確的答案。不幸的是字典映射的想法不適用於這個程序架構。 –