2011-06-08 79 views
2

可能重複:
How do I remove diacritics (accents) from a string in .NET?
How to change diacritic characters to non-diacritic onesC#從字符中刪除口音?

我怎麼能轉換áa在C#中?

例如:aéíúö =>aeiuo

嗯,看了這些線程[我不知道他們被稱爲diatrics,所以我可以爲無法搜索。

我想「滴」的所有diatrics但ñ

目前我有:

public static string RemoveDiacritics(this string text) 
{ 
    string normalized = text.Normalize(NormalizationForm.FormD); 
    var sb = new StringBuilder(); 

    foreach (char c in from c in normalized 
         let u = CharUnicodeInfo.GetUnicodeCategory(c) 
         where u != UnicodeCategory.NonSpacingMark 
         select c) 
    { 
     sb.Append(c); 
    } 

    return sb.ToString().Normalize(NormalizationForm.FormC); 
} 

什麼會留下ñ出的最好的方法?

我的解決辦法是做的foreach後執行以下操作:

var result = sb.ToString(); 

if (text.Length != result.Length) 
    throw new ArgumentOutOfRangeException(); 

int position = -1; 
while ((position = text.IndexOf('ñ', position + 1)) > 0) 
{ 
    result = result.Remove(position, 1).Insert(position, "ñ"); 
} 

return sb.ToString(); 

但是我認爲還有一個不那麼「手動」的方式來做到這一點?

+3

看到這個職位:http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-網絡中的重音符號 – keyboardP 2011-06-08 23:01:40

+0

它取決於底層的代碼點。 http://unicode.org/faq/char_combmark.html – Tim 2011-06-08 23:03:18

回答

1

如果你不想刪除ñ,這是一個選項。它很快。

static string[] pats3 = { "é", "É", "á", "Á", "í", "Í", "ó", "Ó", "ú", "Ú" }; 
    static string[] repl3 = { "e", "E", "a", "A", "i", "I", "o", "O", "u", "U" }; 
    static Dictionary<string, string> _var = null; 
    static Dictionary<string, string> dict 
    { 
     get 
     { 
      if (_var == null) 
      { 
       _var = pats3.Zip(repl3, (k, v) => new { Key = k, Value = v }).ToDictionary(o => o.Key, o => o.Value); 
      } 

      return _var; 
     } 
    } 
    private static string RemoveAccent(string text) 
    { 
     // using Zip as a shortcut, otherwise setup dictionary differently as others have shown 
     //var dict = pats3.Zip(repl3, (k, v) => new { Key = k, Value = v }).ToDictionary(o => o.Key, o => o.Value); 

     //string input = "åÅæÆäÄöÖøØèÈàÀìÌõÕïÏ"; 
     string pattern = String.Join("|", dict.Keys.Select(k => k)); // use ToArray() for .NET 3.5 
     string result = Regex.Replace(text, pattern, m => dict[m.Value]); 

     //Console.WriteLine("Pattern: " + pattern); 
     //Console.WriteLine("Input: " + text); 
     //Console.WriteLine("Result: " + result); 

     return result; 
    } 

如果你想去除n,更快的選擇是: Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(text));