有沒有更好的方法來處理python中的文件編碼？

我有一些不同的未知編碼的文本文件。現在我必須打開一個二進制文件來首先檢測編碼，然後再用該編碼打開它。有沒有更好的方法來處理python中的文件編碼？

bf = open(f, 'rb') 
    code = chardet.detect(bf.read())['encoding'] 
    print(f + ' : ' + code) 
    bf.close() 
    with open(f, 'r', encoding=code) as source: 
    texts = extractText(source.readlines()) 
    source.close() 
    with open(splitext(f)[0] + '_texts.txt', 'w', encoding='utf-8') as dist: 
    dist.write('\n\n'.join('\n'.join(x) for x in texts)) 
    dist.close()

那麼有沒有更好的方法來處理這個問題？

來源

2017-09-13 Jacob

在哪裏這些文件從何而來？ –

看看這個鏈接。可能對你正在尋找的東西有用。 https://stackoverflow.com/questions/18263136/how-to-deal-with-unknown-encoding-when-scraping-webpages –

@EricDuminil它是不同軟件的一些文件。沒有辦法猜測編碼。 – Jacob

而是重新開放，並重新讀取該文件，你可以只解碼您已經閱讀文本：

with open(filename, 'rb') as fileobj: 
    binary = fileobj.read() 
probable_encoding = chardet.detect(binary)['encoding'] 
text = binary.decode(probable_encoding)

來源

2017-09-13 16:47:11 user2357112

有沒有更好的方法來處理python中的文件編碼？

回答

相關問題