字符串規範化

我正在寫一些需要做字符串規範化的代碼，我想把一個給定的字符串變成駱駝案例表示（至少對於最好的猜測）。例如：字符串規範化

"the quick brown fox" => "TheQuickBrownFox" 
"the_quick_brown_fox" => "TheQuickBrownFox" 
"123The_quIck bROWN FOX" => "TheQuickBrownFox" 
"the_quick brown fox 123" => "TheQuickBrownFox123" 
"thequickbrownfox" => "Thequickbrownfox"

我認爲你應該能夠從這些例子中獲得想法。我想刪除所有特殊字符（'，「，！，@，等等），大寫每個單詞（單詞由一個空格，_或 - 定義）和任何前導數字丟棄（尾隨/內部沒關係，但這個要求並不重要，這取決於難度）

我正在努力解決什麼是最好的方法來實現這一目標。我的第一個猜測是使用正則表達式，但我正則表達式的技能是壞的最好的，所以我就真的不知道從哪裏開始。

我的另一個想法是將循環和分析數據，說把它分解成話，解析各一個，並重建了字符串辦法。

或者還有其他方法可以解決這個問題嗎？

來源

2009-03-03 Aaron Powell

如何在Microsoft.VisualBasic命名空間中使用Strings.StrConv的簡單解決方案？（不要忘記添加項目引用Microsoft.VisualBasic程序）：

using System; 
using VB = Microsoft.VisualBasic; 


namespace ConsoleApplication1 
{ 
    class Program 
    { 
     static void Main(string[] args) 
     { 
      Console.WriteLine(VB.Strings.StrConv("QUICK BROWN", VB.VbStrConv.ProperCase, 0)); 
      Console.ReadLine(); 
     } 
    } 
}

來源

2009-03-03 02:20:38

哇！這是一個很好的... – Codex 2009-03-03 11:14:27

以爲這會是有趣的嘗試，這就是我想出了：

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 

namespace ConsoleApplication2 
{ 
    class Program 
    { 
     static void Main(string[] args) 
     { 
      StringBuilder sb = new StringBuilder(); 
      string sentence = "123The_quIck bROWN FOX1234"; 

      sentence = sentence.ToLower(); 

      char[] s = sentence.ToCharArray(); 

      bool atStart = true; 
      char pChar = ' '; 

      char[] spaces = { ' ', '_', '-' }; 
      char a; 
      foreach (char c in s) 
      { 
       if (atStart && char.IsDigit(c)) continue; 

       if (char.IsLetter(c)) 
       { 
        a = c; 
        if (spaces.Contains(pChar)) 
         a = char.ToUpper(a); 
        sb.Append(a); 
        atStart = false; 
       } 
       else if(char.IsDigit(c)) 
       { 
        sb.Append(c); 
       } 
       pChar = c; 
      } 

      Console.WriteLine(sb.ToString()); 
      Console.ReadLine(); 
     } 
    } 
}

來源

2009-03-03 02:11:41

哎呀，我想你和我差不多到達相同的位置！ – 2009-03-03 02:16:18

此正則表達式匹配所有單詞。然後，我們Aggregate他們與一個方法，大寫的第一個字符，和ToLower s字符串的其餘部分。

Regex regex = new Regex(@"[a-zA-Z]*", RegexOptions.Compiled); 

private string CamelCase(string str) 
{ 
    return regex.Matches(str).OfType<Match>().Aggregate("", (s, match) => s + CamelWord(match.Value)); 
} 

private string CamelWord(string word) 
{ 
    if (string.IsNullOrEmpty(word)) 
     return ""; 

    return char.ToUpper(word[0]) + word.Substring(1).ToLower(); 
}

順便說一下，此方法會忽略數字。要添加它們，您可以將正則表達式更改爲@"[a-zA-Z]*|[0-9]*"，我想 - 但我沒有測試它。

來源

2009-03-03 02:15:24 configurator

，其涉及特定的匹配字符可能無法與某些字符編碼很好地工作，特別是，如果正在使用Unicode表示的任何溶液，其具有數十個空格字符，數千個「符號」，數千個標點符號，數千個「字母」等。如果使用內置的Unicode識別功能，這將更好。根據什麼是「特殊字符」，你可以根據Unicode categories來決定。例如，它會包含'標點符號'，但會包含'符號'嗎？

ToLower（），IsLetter（）等應該沒問題，並考慮Unicode中所有可能的字母。匹配破折號和斜槓應該考慮到Unicode中的幾十個空格和短劃線字符。

來源

2009-03-03 02:36:40 thomasrutter

你可以wear ruby slippers to work :)

def camelize str 
    str.gsub(/^[^a-zA-z]*/, '').split(/[^a-zA-Z0-9]/).map(&:capitalize).join 
end

來源

2009-03-03 04:08:53

字符串規範化

回答

相關問題