檢索編碼錯誤的文本編碼

我有一個從Foxpro（基於Dos的）程序導出的文本文件，但此文本包含非英文字符（阿拉伯語[從右到左]），現在導出的字符串是這樣的「¤」گüن「。檢索編碼錯誤的文本編碼

有什麼方法可以將它們轉換回原始值嗎？

2012-06-19 mohsen dorparasti

您應該使用正確的代碼頁讀取數據。

public static string ReadFile(string path, int codepage) 
{ 
    return Encoding.GetEncoding(codepage) 
     .GetString(File.ReadAllBytes(path)); 
}

調用與正確的代碼頁ID的功能，MS-DOS阿拉伯語它應該是「708」，爲一個完整的列表，你可以在Wikipedia啓動。

string content = ReadFile(@"c:\test.txt", 708);

解查找表以從不受支持的編碼翻譯（僅用於本地字符> 127是必需的映射）：

public static string ReadFile(string path, byte[] translationTable, int codepage) 
{ 
    byte[] content = File.ReadAllBytes(path); 
    for (int i=0; i < content.Length; ++i) 
    { 
     byte value = content[i]; 
     if (value > 127) 
      content[i] = translationTable[value - 128]; 
    } 

    return Encoding.GetEncoding(codepage) 
     .GetString(content); 
}

轉換表的一個例子：

 
Index Original (IS) Translated (1256) 
... 
13  141    194 
...

來源

2012-06-19 07:56:52

代碼頁面不正確。我在哪裏可以找到代碼頁的列表？ –

@reza增加了一個鏈接 –

我發現這個編碼是一個叫做IranSystem的自定義編碼，它被創建用於FARSI（波斯語）語言。 http://en.wikipedia.org/wiki/Iran_System_encoding_standard –

檢索編碼錯誤的文本編碼

回答

相關問題