從文本文件中讀取非ASCII字符

我正在使用python 2.7。我已經嘗試了許多像編解碼器但不起作用的東西。我怎樣才能解決這個問題。從文本文件中讀取非ASCII字符

myfile.txt的

wörd

我的代碼

f = open('myfile.txt','r') 
for line in f: 
    print line 
f.close()

輸出

s\xc3\xb6zc\xc3\xbck

輸出是上蝕和命令窗口相同。我正在使用Win7。當我沒有從文件讀取時，任何字符都沒有問題。

來源

2012-04-29 Rckt

你期待什麼結果呢？從技術上講，python已經正確地讀取了該文件。 – srgerg

爲什麼你一個字符地打印出一行字符？爲什麼不簡單地說'在f：print line'中的行？當我這樣做時，它會根據需要印上「söcük」。 – srgerg

我試過但不起作用。它打印了s \ xc3 \ xb6zc \ xc3 \ xbck。 – Rckt

首先是 - 檢測編碼


    from chardet import detect 
    encoding = lambda x: detect(x)['encoding'] 
    print encoding(line)

然後 - 將其轉換Unicode或你的默認編碼STR：


    n_line=unicode(line,encoding(line),errors='ignore') 
    print n_line 
    print n_line.encode('utf8')

來源

2012-04-30 00:16:51 lavrton

這是終端編碼。嘗試使用您在文件中使用的相同編碼來配置終端。我建議你使用UTF-8。

順便說一句，是一個很好的做法，解碼編碼所有的輸入 - 輸出以避免出現問題：

f = open('test.txt','r')  
for line in f: 
    l = unicode(line, encoding='utf-8')# decode the input                     
    print l.encode('utf-8') # encode the output                        
f.close()

來源

2012-04-30 00:18:12 jgomo3

現在我明白他們爲什麼要在3.0中製作UTF-8標準。（PEP 3120） – mgold

@mgold：PEP 3120全部是關於源（.py）文件的編碼;這與OP對輸入和/或輸出編碼的問題沒有任何關係。 –

哦。接得好。 – mgold

import codecs 
#open it with utf-8 encoding 
f=codecs.open("myfile.txt","r",encoding='utf-8') 
#read the file to unicode string 
sfile=f.read() 

#check the encoding type 
print type(file) #it's unicode 

#unicode should be encoded to standard string to display it properly 
print sfile.encode('utf-8') 
#check the type of encoded string 

print type(sfile.encode('utf-8'))

來源

2013-02-09 11:58:32

從文本文件中讀取非ASCII字符

回答

相關問題