Python：Unicode和「\ xe2 \ x80 \ x99」驅使我batty

所以我有一個Google文檔中的.txt文件，其中包含David Foster Wallace的「Oblivion」中的一些行。使用：Python：Unicode和「 xe2 x80 x99」驅使我batty

with open("oblivion.txt", "r", 0) as bookFile: 
    wordList = [] 
    for line in bookFile: 
     wordList.append(line)

並返回&打印詞表，我得到：

"surgery on the crow\xe2\x80\x99s feet around her eyes."

（它截斷了大量的文字）。但是，如果不是追加wordList我只是

for line in bookFile: 
    print line

一切都變得很好！對.read（）文件也是如此 - 結果的str沒有瘋狂的字節表示，但是我無法按照我想要的方式操作它。

我在哪裏.encode（）或.decode（）或什麼？使用Python 2是因爲3給了我一些I/O緩衝區錯誤。謝謝。

我解決了I/O錯誤信息。 –

您的輸出是正確的。當你打印一個列表中的字符串時，它會顯示出來。您可以看到Unicode字符'U + 2019右單引號'的十六進制表示'\ xe2 \ x80 \ x99'。使用是印刷不正確，但一個常見的錯誤。 –

在Python 3中，嘗試從open（「oblivion.txt」，「r」，0）中刪除0-arg作爲bookFile： –

嘗試open與encoding爲utf-8：

with open("oblivion.txt", "r", encoding='utf-8') as bookFile: 
    wordList = [] 
    for line in bookFile: 
     wordList.append(line)

2017-07-01 10:51:02 Rahul

這在Python 3中起作用。沒有意識到我可以從開始編碼。謝謝。 –

回答