我有一段代碼,在Python3效果很好:編碼特殊字符
def encode_test(filepath, char_to_int):
with open(filepath, "r", encoding= "latin-1") as f:
dat = [line.rstrip() for line in f]
string_to_int = [[char_to_int[char] if char != 'ó' else char_to_int['ò'] for char in line] for line in dat]
然而,當我嘗試這樣做在Python2.7,我第一次得到了錯誤
SyntaxError: Non-ASCII character '\xc3' in file languageIdentification.py on line 30, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
然後我意識到我可能需要在代碼頂部添加#coding = utf-8。但是,這樣做後,我遇到了另一個錯誤:
UnicodeWarning: Unicode unequal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
string_to_int = [[char_to_int[char] if char != 'ó' else char_to_int['ò'] for char in line] for line in dat]
Traceback (most recent call last):
File "languageIdentification.py", line 190, in <module>
test_string = encode_test(sys.argv[3], char_to_int)
File "languageIdentification.py", line 32, in encode_test
string_to_int = [[char_to_int[char] if char != 'ó' else
char_to_int['ò'] for char in line] for line in dat]
KeyError: u'\xf3'
所以有人可以告訴我,我能做些什麼來解決Python2.7中的這個問題?
謝謝!
Python 3'str'對象實際上是等價於Python 2'unicode'對象,Python 2'str'對象等同於Python 3'bytes'。只需將* everything *轉換爲源代碼中的unicode對象並使用它們即可。 –
@ juanpa.arrivillaga其實我無法對源文件進行更改。無論如何,我可以直接在該計劃中進行操作嗎? – Parker
什麼?你的意思是在你的文本文件中?你必須改變你的代碼,當'str'類型的性質發生根本性改變時,你不能指望能夠在Python 2中重新使用python 3代碼 –