如何查找/修復：ArgumentError：UTF-8中的無效字節序列？

我有一個讀取大客戶提供的數據文件的應用程序。它完美地與幾個，但是，在我今天收到一個文件，它與失敗：如何查找/修復：ArgumentError：UTF-8中的無效字節序列？

ArgumentError: invalid byte sequence in UTF-8

我使用String.match尋找正則表達式模式。

當我看着文件，沒有什麼看起來不同於那些工作。

建議？

編輯：它看起來像用戶名中有一個'xE9'字符。

來源

2012-12-04 n8gard

你有沒有看任何的頁面的右側的相關問題？嘗試讀一些這些：http://stackoverflow.com/search?q=[ruby]+invalid+byte+sequence –

http://stackoverflow.com/questions/6374756/why-do-i-get-an-無效字節序列在UTF-8錯誤閱讀的文本文件？rq = 1 –

我做到了。至少沒有任何東西適用於我。我只是逐行閱讀一個文本文件。 – n8gard

感謝@ muistooshort的幫助，我以ISO模式打開文件，然後逐行讀取，轉換爲UTF-8。

myfile = File.open('thefile.txt', 'r:iso8859-1') 
    while rawline = myfile.gets 
    line = rawline.force_encoding('utf-8') 
    # proceed... 
end

來源

2012-12-06 16:39:33 n8gard

不是說這是理想的解決方案，但它似乎很簡單，完全解決了我的問題在多個受影響的數據文件。 – n8gard

，說明了解決小耙作業：

task :reencode, [:filename] => [:environment] do |t, args| 
    myfile = File.open(args[:filename], 'r:iso8859-1') 
    outfile = File.open(args[:filename] + ".out", "w+") 
    while rawline = myfile.gets 
    line = rawline.force_encoding('utf-8') 
    outfile.write line 
    end 
    outfile.close 
end

來源

2015-08-17 18:18:00 Jason

如何查找/修復：ArgumentError：UTF-8中的無效字節序列？

回答

相關問題