大文件的編碼轉換

我面臨一個大的（〜18 GB）文件，從SQL Server導出爲Unicode文本文件，這意味着它的編碼是UTF-16（小端）。該文件現在存儲在運行Linux的計算機上，但我還沒有想出將其轉換爲UTF-8的方法。大文件的編碼轉換

起初我試過使用iconv，但文件太大了。我的下一個方法是逐個使用拆分和轉換文件，但這也不起作用 - 轉換過程中出現很多錯誤。

那麼，有關如何將其轉換爲UTF-8的任何想法？任何幫助都感激不盡。

2011-07-08 Jose L. Lykón

由於您使用的是SQL Server，因此我認爲您的平臺是Windows。在最簡單的情況下，您可以快速編寫一個骯髒的.NET應用程序，該應用程序逐行讀取源代碼，並在轉換後寫入轉換後的文件。事情是這樣的：

using System; 
using System.IO; 
using System.Text; 

namespace UTFConv { 
    class Program { 
     static void Main(string[] args) { 
      try { 
       Encoding encSrc = Encoding.Unicode; 
       Encoding encDst = Encoding.UTF8; 
       uint lines = 0; 
       using (StreamReader src = new StreamReader(args[0], encSrc)) { 
        using (StreamWriter dest = new StreamWriter(args[1], false, encDst)) { 
         string ln; 
         while ((ln = src.ReadLine()) != null) { 
          lines++; 
          dest.WriteLine(ln); 
         } 
        } 
       } 
       Console.WriteLine("Converted {0} lines", lines); 
      } catch (Exception x) { 
       Console.WriteLine("Problem converting the file: {0}", x.Message); 
      } 
     } 
    } 
}

只需打開Visual Studio中，啓動一個新的C＃控制檯應用程序項目，在那裏粘貼此代碼，編譯，並在命令行中運行它。第一個參數是你的源文件，第二個參數是你的目標文件。應該管用。

來源

2011-07-08 22:32:25 Rom

嗨ROm，我從內存中獲取內存 – user145610

大文件的編碼轉換

回答

相關問題