寫入文件時出現Python unicode錯誤

我使用Python 2.7解析了一大堆網頁，並從網頁中獲取內容，但網頁中包含「」和「」等字符，它們都以某種方式轉換爲「Äô」。這給了我一個內容如下所示的文件（不包括引號）：「我認爲它很重要...」寫入文件時出現Python unicode錯誤

使用print()方法在終端中打印出的字符串很好，但我無法似乎使用print >> file, string或file.write(string)獲得相同的效果。顯然這是一個編碼問題，但我沒有找到解決方法。我打開這樣的文件：file = codecs.open("file.txt","w+", encoding='utf-8')，我使用BeautifulSoup4的getText()方法爲字符串賦值。有什麼方法可以解決這個問題嗎？

來源

2015-10-15 tdon

你能給我們提供該頁面的鏈接嗎？ – alexanderlukanin13

由於缺乏可重複的代碼，因此不適用。 http://stackoverflow.com/help/how-to-ask –

-1

嘗試添加以下代碼行中的函數開始，這將解決您的問題。

 import sys 
     reload(sys) 
     sys.setdefaultencoding('utf8')

來源

2015-10-15 07:28:01 jack

它的工作！非常感謝:) – tdon

乾杯夥計!!!!!!!! – jack

這是一個令人討厭的修復 - 所有破解。你很快就會發現它掩蓋了其他問題，因爲你用大錘破解了一個螺母 –

你可以嘗試寫出來的：

file.write(output_str.encode('utf-8', 'ignore'))

來源

2015-10-15 07:02:16 sureshvv

在你的代碼的開頭強制utf8編碼：

#!/usr/bin/python 
# -*- coding: utf-8 -*- 
myfile = open('./myfile.txt', 'w') 
myfile.write("I think it's important to be able to see all characters") 
myfile.write("\nIt woùld be Ñìçè to see foreign letters as well") 
myfile.write("\n") 
myfile.close()

來源

2015-10-15 08:24:16

這隻意味着**源中的非ASCII代碼**可以被正確解釋。當你創建Unicode對象時，它確實有任何意義，你不是 –

一些源代碼就已經不錯了。

BeautifulSoup通常在猜測定字符串的編碼做得很好：

from bs4 import BeautifulSoup as bs4 

>>> print bs4("\x80", "html.parser").text # Windows 1252 
€ 

>>> print bs4("\xe2\x82\xac", "html.parser").text # UTF-8 
€

除了當它不能：

>>> print bs4("\xa4", "html.parser").text # ISO-8859-15 
¤

因此，你應該通過BeautifulSoup解碼的Unicode代替：

>>> print bs4("\xa4".decode("iso-8859-15"), "html.parser").text # ISO-8859-15 
€

這意味着你的輸入數據需要被解碼cor rectly。用io.open(filename, "r", encoding="utf-8")（或適當的編碼）打開輸入文件。

如果拉動遠程網站，請檢查「Content-type」標頭或使用請求，它在響應對象的.text屬性中返回已解碼的Unicode。

寫入文件時，您有使用編解碼器模塊的正確思想。 io模塊是更新的方式。

當你把所有這些放在一起時，你會寫出正確編碼的數據。

來源

2015-12-26 13:31:09

寫入文件時出現Python unicode錯誤

回答

相關問題