我正在使用Visual Studio的Python工具並閱讀一些用意大利文寫成的文件。試過iso-8859-1,iso-8859-2,utf-8,utf-8-sig。 Notepad ++將文件打開爲不含BOM的UTF-8。哪種編碼用於在Python中閱讀意大利文字?
content = fp.read()
words = content.decode("utf-8-sig").lower().split()
for w in words:
p=''
cur.execute('SELECT word FROM multiwordnet.italian_lemma l, multiwordnet.italian_synset s where l.id = s.id and l.lemma="%s"' % w)
導致崩潰的字符串是C'è
。 (入門讀作"c\'\xe3\xa8"
)
使用chardet的不利於
Traceback (most recent call last):
File "C:\Users\Tathagata\Documents\Visual Studio 2012\Projects\PythonApplicati
on4\PythonApplication4\PythonApplication4.py", line 344, in <module>
createSynsetDict()
File "C:\Users\Tathagata\Documents\Visual Studio 2012\Projects\PythonApplicati
on4\PythonApplication4\PythonApplication4.py", line 294, in createSynsetDict
cur.execute('SELECT word FROM multiwordnet.italian_lemma l, multiwordnet.it
alian_synset s where l.id = s.id and l.lemma="%s"' % w)
File "C:\Python27\lib\site-packages\pymysql\cursors.py", line 117, in execute
self.errorhandler(self, exc, value)
File "C:\Python27\lib\site-packages\pymysql\connections.py", line 187, in defa
ulterrorhandler
raise Error(errorclass, errorvalue)
Error: (<type 'exceptions.UnicodeEncodeError'>, UnicodeEncodeError('ascii', u's\
x00\x00\x00\x03SELECT word FROM multiwordnet.italian_lemma l, multiwordnet.ital
ian_synset s where l.id = s.id and l.lemma="c\'\xe3\xa8"', 116, 118, 'ordinal no
t in range(128)'))
[如何停止的痛苦?(HTTP:// nedbatchelder .com/text/unipain.html) – 2013-04-04 23:53:06
您正在使用哪種DB-API綁定?(即,哪個數據庫驅動程序?) – 2013-04-04 23:56:40
...實際上,更重要的是,「paramstyle」全局值的值是多少你的數據庫庫的模塊?(如果你不知道,只需標識模塊,我們可以查看它) – 2013-04-04 23:59:02