2010-06-23 45 views
0

下面的行是作爲我有很多字符編碼錯誤的文件之一的例子;什麼是符合此轉換的字符編碼:從「§」到「Ç」?

REAPRESENTA§AO VIA DTENTRY 

正確的顯示應該是這樣的:

REAPRESENTAÇAO VIA DTENTRY 

還有更多的字符錯誤的編碼。我該如何糾正?

alt text http://nerull.webs.com/screen.JPG

+1

你確定你正在使用的字體是不是原因? – Will 2010-06-23 11:41:54

+0

不是的。它是一個純文本文件,兼容utf-8字體! – 2010-06-23 11:43:06

+0

請提供更多意見。 – kennytm 2010-06-23 12:11:51

回答

3

本身沒有錯編碼的文件,所以當你讀到您使用了錯誤的編碼來解碼的文件的。

更正是使用相同的編碼來解碼用於對其進行編碼的文件。

如果你不知道是什麼編碼,你應該在解碼之前找出有問題的字符的字節碼,然後尋找一個字符集與字符代碼匹配的字符集。

例如,可以使用IBM905對文件進行編碼,以便將字符「Ç」編碼爲字節碼74.如果您使用IBM278解碼文件,則字節碼74會被解釋爲字符「§」 。

這裏是我在發現可能的組合列表內置的編碼:

from cp875 to IBM290 
from cp875 to IBM420 
from cp875 to x-EBCDIC-KoreanExtended 
from cp875 to IBM-Thai 
from cp875 to IBM880 
from IBM290 to IBM290 
from IBM290 to IBM420 
from IBM290 to x-EBCDIC-KoreanExtended 
from IBM290 to IBM-Thai 
from IBM290 to IBM880 
from IBM420 to IBM290 
from IBM420 to IBM420 
from IBM420 to x-EBCDIC-KoreanExtended 
from IBM420 to IBM-Thai 
from IBM420 to IBM880 
from IBM424 to IBM290 
from IBM424 to IBM420 
from IBM424 to x-EBCDIC-KoreanExtended 
from IBM424 to IBM-Thai 
from IBM424 to IBM880 
from x-EBCDIC-KoreanExtended to IBM290 
from x-EBCDIC-KoreanExtended to IBM420 
from x-EBCDIC-KoreanExtended to x-EBCDIC-KoreanExtended 
from x-EBCDIC-KoreanExtended to IBM-Thai 
from x-EBCDIC-KoreanExtended to IBM880 
from IBM-Thai to IBM290 
from IBM-Thai to IBM420 
from IBM-Thai to x-EBCDIC-KoreanExtended 
from IBM-Thai to IBM-Thai 
from IBM-Thai to IBM880 
from IBM880 to IBM290 
from IBM880 to IBM420 
from IBM880 to x-EBCDIC-KoreanExtended 
from IBM880 to IBM-Thai 
from IBM880 to IBM880 
from cp1025 to IBM290 
from cp1025 to IBM420 
from cp1025 to x-EBCDIC-KoreanExtended 
from cp1025 to IBM-Thai 
from cp1025 to IBM880 
from IBM1026 to IBM01143 
from IBM1026 to IBM278 
from IBM905 to IBM01143 
from IBM905 to IBM278 
+1

@Guffa:我認爲這就是問題所在(即最後一段描述的過程),使用SO的Mechanical Turk實現。 – 2010-06-23 11:46:01

+0

@Guffa,看看這個圖像是否有助於識別編碼。 – 2010-06-23 12:03:52

+0

@Guffa,你知道是否有可能使用PHP做這種轉換? – 2010-06-23 12:07:59