2012-12-01 67 views

回答

15
var text = "ÜST"; 
var unaccentedText = String.Join("", text.Normalize(NormalizationForm.FormD) 
     .Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)); 
+2

這不會使'ı'標準化。其他解決方案? – jackjop

+0

'var text =「ÜST」; var unaccentedText = String.Join(「」,text.Normalize(NormalizationForm.FormD) .Where(c => char.GetUnicodeCategory(c)!= UnicodeCategory.NonSpacingMark))。Replace(「ı」,「i」) ;' // swh –

7

我不是這種事情的專家,但我認爲通過分解值,然後有效地消除了非ASCII字符可以使用string.Normalize做到這一點,:

using System; 
using System.Linq; 
using System.Text; 

class Test 
{ 
    static void Main() 
    { 
     string text = "\u00DCST"; 
     string normalized = text.Normalize(NormalizationForm.FormD); 
     string asciiOnly = new string(normalized.Where(c => c < 128).ToArray()); 
     Console.WriteLine(asciiOnly); 
    }  
} 

儘管這在某些情況下完全可能會造成可怕的結果。

8

您可以使用以下方法解決您的問題。其他方法不正確地轉換「土耳其小寫字母I(\ u0131)」。

public static string RemoveDiacritics(string text) 
{ 
    Encoding srcEncoding = Encoding.UTF8; 
    Encoding destEncoding = Encoding.GetEncoding(1252); // Latin alphabet 

    text = destEncoding.GetString(Encoding.Convert(srcEncoding, destEncoding, srcEncoding.GetBytes(text))); 

    string normalizedString = text.Normalize(NormalizationForm.FormD); 
    StringBuilder result = new StringBuilder(); 

    for (int i = 0; i < normalizedString.Length; i++) 
    { 
     if (!CharUnicodeInfo.GetUnicodeCategory(normalizedString[i]).Equals(UnicodeCategory.NonSpacingMark)) 
     { 
      result.Append(normalizedString[i]); 
     } 
    } 

    return result.ToString(); 
} 
2

這不是一個需要通用解決方案的問題。據瞭解,土耳其字母表中只有12個特殊字符需要標準化。這些是ı,İ,ö,Ö,ç,Ç,ü,Ü,,,Ğ,ş,Ş。你可以寫出12條規則來替代那些與他們的英語對應的規則:我,我,O,O,C,C,U,U,G,G,S,S。

1
Public Function Ceng(ByVal _String As String) As String 
    Dim Source As String = "ığüşöçĞÜŞİÖÇ" 
    Dim Destination As String = "igusocGUSIOC" 
    For i As Integer = 0 To Source.Length - 1 
     _String = _String.Replace(Source(i), Destination(i)) 
    Next 
    Return _String 
End Function