2012-08-10 27 views
19

我正在從一箇舊的數據庫導入一些記錄與多個string字段到一個新的數據庫。這似乎是很慢,我懷疑這是因爲我這樣做:以最快的方式替換字符串中的多個字符?

foreach (var oldObj in oldDB) 
{ 
    NewObject newObj = new NewObject(); 
    newObj.Name = oldObj.Name.Trim().Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š') 
     .Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć') 
     .Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); 
    newObj.Surname = oldObj.Surname.Trim().Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š') 
     .Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć') 
     .Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); 
    newObj.Address = oldObj.Address.Trim().Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š') 
     .Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć') 
     .Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); 
    newObj.Note = oldObj.Note.Trim().Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š') 
     .Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć') 
     .Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); 
    /* 
    ... some processing ... 
    */ 
} 

現在,我已經閱讀過一些網絡帖子和文章,我已經看到了這個許多不同的想法。有人說如果我用MatchEvaluator做正則表達式會更好,有人說這是最好的。

儘管我自己可能會更容易爲自己做一個基準測試案例,但我決定在這裏提出一個問題,以防其他人一直在質疑同一個問題,或者如果有人提前知道。

那麼在C#中做到這一點的最快方法是什麼?

編輯

我已經發布了基準here。初看起來,理查德的方式看起來可能是最快的。然而,由於錯誤的正則表達式模式,他的方式,也就是Marc's,都會做任何事情。修正從

@"\^@\[\]`\}~\{\\" 

的模式

@"\^|@|\[|\]|`|\}|~|\{|\\" 

後,它看起來好像與鏈.Replace(老辦法)調用畢竟

+0

我會建議保持原樣。也許嘗試一個平行的foreach? – h1ghfive 2012-08-10 10:23:32

+6

你看這就是你的理由?你應該知道_。您需要對應用程序進行簡介以改善瓶頸 - 不要猜測。 – Oded 2012-08-10 10:24:28

+1

我曾問過[this](http://stackoverflow.com/questions/9600177/how-to-replace-two-or-more-strings-with-each-other)並接受[this](http:// stackoverflow.com/a/9600320/704144),但我不確定這是你想要的。 – 2012-08-10 10:26:09

回答

22

感謝您的投入。 我寫了一個快速和骯髒的基準來測試你的輸入。我已經測試了用500.000次迭代解析4個字符串並完成了4遍。結果如下:

 
*** Pass 1 
Old (Chained String.Replace()) way completed in 814 ms 
logicnp (ToCharArray) way completed in 916 ms 
oleksii (StringBuilder) way completed in 943 ms 
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2551 ms 
Richard (Regex w/ MatchEvaluator) way completed in 215 ms 
Marc Gravell (Static Regex) way completed in 1008 ms 

*** Pass 2 
Old (Chained String.Replace()) way completed in 786 ms 
logicnp (ToCharArray) way completed in 920 ms 
oleksii (StringBuilder) way completed in 905 ms 
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2515 ms 
Richard (Regex w/ MatchEvaluator) way completed in 217 ms 
Marc Gravell (Static Regex) way completed in 1025 ms 

*** Pass 3 
Old (Chained String.Replace()) way completed in 775 ms 
logicnp (ToCharArray) way completed in 903 ms 
oleksii (StringBuilder) way completed in 931 ms 
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2529 ms 
Richard (Regex w/ MatchEvaluator) way completed in 214 ms 
Marc Gravell (Static Regex) way completed in 1022 ms 

*** Pass 4 
Old (Chained String.Replace()) way completed in 799 ms 
logicnp (ToCharArray) way completed in 908 ms 
oleksii (StringBuilder) way completed in 938 ms 
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2592 ms 
Richard (Regex w/ MatchEvaluator) way completed in 225 ms 
Marc Gravell (Static Regex) way completed in 1050 ms 

該基準測試的代碼如下。請查看代碼並確認@Richard獲得了最快的方式。請注意,我沒有檢查輸出是否正確,我假定它們是。

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 
using System.Diagnostics; 
using System.Text.RegularExpressions; 

namespace StringReplaceTest 
{ 
    class Program 
    { 
     static string test1 = "A^@[BCD"; 
     static string test2 = "E]FGH\\"; 
     static string test3 = "ijk`l}m"; 
     static string test4 = "nopq~{r"; 

     static readonly Dictionary<char, string> repl = 
      new Dictionary<char, string> 
      { 
       {'^', "Č"}, {'@', "Ž"}, {'[', "Š"}, {']', "Ć"}, {'`', "ž"}, {'}', "ć"}, {'~', "č"}, {'{', "š"}, {'\\', "Đ"} 
      }; 

     static readonly Regex replaceRegex; 

     static Program() // static initializer 
     { 
      StringBuilder pattern = new StringBuilder().Append('['); 
      foreach (var key in repl.Keys) 
       pattern.Append(Regex.Escape(key.ToString())); 
      pattern.Append(']'); 
      replaceRegex = new Regex(pattern.ToString(), RegexOptions.Compiled); 
     } 

     public static string Sanitize(string input) 
     { 
      return replaceRegex.Replace(input, match => 
      { 
       return repl[match.Value[0]]; 
      }); 
     } 

     static string DoGeneralReplace(string input) 
     { 
      var sb = new StringBuilder(input); 
      return sb.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ').ToString(); 
     } 

     //Method for replacing chars with a mapping 
     static string Replace(string input, IDictionary<char, char> replacementMap) 
     { 
      return replacementMap.Keys 
       .Aggregate(input, (current, oldChar) 
        => current.Replace(oldChar, replacementMap[oldChar])); 
     } 

     static void Main(string[] args) 
     { 
      for (int i = 1; i < 5; i++) 
       DoIt(i); 
     } 

     static void DoIt(int n) 
     { 
      Stopwatch sw = new Stopwatch(); 
      int idx = 0; 

      Console.WriteLine("*** Pass " + n.ToString()); 
      // old way 
      sw.Start(); 
      for (idx = 0; idx < 500000; idx++) 
      { 
       string result1 = test1.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); 
       string result2 = test2.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); 
       string result3 = test3.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); 
       string result4 = test4.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ'); 
      } 
      sw.Stop(); 
      Console.WriteLine("Old (Chained String.Replace()) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); 

      Dictionary<char, char> replacements = new Dictionary<char, char>(); 
      replacements.Add('^', 'Č'); 
      replacements.Add('@', 'Ž'); 
      replacements.Add('[', 'Š'); 
      replacements.Add(']', 'Ć'); 
      replacements.Add('`', 'ž'); 
      replacements.Add('}', 'ć'); 
      replacements.Add('~', 'č'); 
      replacements.Add('{', 'š'); 
      replacements.Add('\\', 'Đ'); 

      // logicnp way 
      sw.Reset(); 
      sw.Start(); 
      for (idx = 0; idx < 500000; idx++) 
      { 
       char[] charArray1 = test1.ToCharArray(); 
       for (int i = 0; i < charArray1.Length; i++) 
       { 
        char newChar; 
        if (replacements.TryGetValue(test1[i], out newChar)) 
         charArray1[i] = newChar; 
       } 
       string result1 = new string(charArray1); 

       char[] charArray2 = test2.ToCharArray(); 
       for (int i = 0; i < charArray2.Length; i++) 
       { 
        char newChar; 
        if (replacements.TryGetValue(test2[i], out newChar)) 
         charArray2[i] = newChar; 
       } 
       string result2 = new string(charArray2); 

       char[] charArray3 = test3.ToCharArray(); 
       for (int i = 0; i < charArray3.Length; i++) 
       { 
        char newChar; 
        if (replacements.TryGetValue(test3[i], out newChar)) 
         charArray3[i] = newChar; 
       } 
       string result3 = new string(charArray3); 

       char[] charArray4 = test4.ToCharArray(); 
       for (int i = 0; i < charArray4.Length; i++) 
       { 
        char newChar; 
        if (replacements.TryGetValue(test4[i], out newChar)) 
         charArray4[i] = newChar; 
       } 
       string result4 = new string(charArray4); 
      } 
      sw.Stop(); 
      Console.WriteLine("logicnp (ToCharArray) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); 

      // oleksii way 
      sw.Reset(); 
      sw.Start(); 
      for (idx = 0; idx < 500000; idx++) 
      { 
       string result1 = DoGeneralReplace(test1); 
       string result2 = DoGeneralReplace(test2); 
       string result3 = DoGeneralReplace(test3); 
       string result4 = DoGeneralReplace(test4); 
      } 
      sw.Stop(); 
      Console.WriteLine("oleksii (StringBuilder) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); 

      // André Christoffer Andersen way 
      sw.Reset(); 
      sw.Start(); 
      for (idx = 0; idx < 500000; idx++) 
      { 
       string result1 = Replace(test1, replacements); 
       string result2 = Replace(test2, replacements); 
       string result3 = Replace(test3, replacements); 
       string result4 = Replace(test4, replacements); 
      } 
      sw.Stop(); 
      Console.WriteLine("André Christoffer Andersen (Lambda w/ Aggregate) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); 

      // Richard way 
      sw.Reset(); 
      sw.Start(); 
      Regex reg = new Regex(@"\^|@|\[|\]|`|\}|~|\{|\\"); 
      MatchEvaluator eval = match => 
      { 
       switch (match.Value) 
       { 
        case "^": return "Č"; 
        case "@": return "Ž"; 
        case "[": return "Š"; 
        case "]": return "Ć"; 
        case "`": return "ž"; 
        case "}": return "ć"; 
        case "~": return "č"; 
        case "{": return "š"; 
        case "\\": return "Đ"; 
        default: throw new Exception("Unexpected match!"); 
       } 
      }; 
      for (idx = 0; idx < 500000; idx++) 
      { 
       string result1 = reg.Replace(test1, eval); 
       string result2 = reg.Replace(test2, eval); 
       string result3 = reg.Replace(test3, eval); 
       string result4 = reg.Replace(test4, eval); 
      } 
      sw.Stop(); 
      Console.WriteLine("Richard (Regex w/ MatchEvaluator) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms"); 

      // Marc Gravell way 
      sw.Reset(); 
      sw.Start(); 
      for (idx = 0; idx < 500000; idx++) 
      { 
       string result1 = Sanitize(test1); 
       string result2 = Sanitize(test2); 
       string result3 = Sanitize(test3); 
       string result4 = Sanitize(test4); 
      } 
      sw.Stop(); 
      Console.WriteLine("Marc Gravell (Static Regex) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms\n"); 
     } 
    } 
} 
+0

他他,這真棒! – oleksii 2012-08-10 12:01:46

+2

「正則表達式」速度更快並不奇怪。它的構建以可笑的效率搜索字符串。永遠記住,儀器法律是不好的 - 利用爲你想要做的事情而建立的技術,所以不要害怕使用正則表達式。 C#並不擅長一切,只是因爲它有一個API。好問題和良好的基準@Dejan。 – 2012-08-10 12:38:52

+2

我還會添加一件東西 - 您的測試字符串非常短。雖然這可能是真實數據的情況(在這種情況下,您的基準測試是正確的,並且可以發現),但它會導致更長的字符串的結果出現偏差,並會用不同數量的字符來替換等等。我懷疑這是相對較好的原因string.Replace的性能 - 它確實會一遍又一遍地創建字符串(儘管只是在某些情況發生變化時),但是循環和字符串結果都非常小,所以它不會花費太多。長字符串的區別會更加明顯。 – Luaan 2014-02-07 09:00:08

9

最快試試這個:

Dictionary<char, char> replacements = new Dictionary<char, char>(); 
// populate replacements 

string str = "mystring"; 
char []charArray = str.ToCharArray(); 

for (int i = 0; i < charArray.Length; i++) 
{ 
    char newChar; 
    if (replacements.TryGetValue(str[i], out newChar)) 
    charArray[i] = newChar; 
} 

string newStr = new string(charArray); 
+0

+1我只會嘗試添加一個IndexOfAny以避免在不需要字符串時循環 – Steve 2012-08-10 10:34:21

+2

@Steve - IndexOfAny也會在內部使用一個循環。沒有辦法避免這個單一的循環。 – logicnp 2012-08-10 10:38:18

+0

感謝您的回答。請看看[基準測試](http://stackoverflow.com/questions/11899668/replacing-multiple-characters-in-a-string-the-fastest-way/11900932#11900932)我發佈了另一個答案。 – 2012-08-10 11:47:02

5

一個可能的解決方案是爲此使用StringBuilder類。

您可以將代碼首先重構爲一個單一的方法

public string DoGeneralReplace(string input) 
{ 
    var sb = new StringBuilder(input); 
    sb.Replace("^", "Č") 
     .Replace("@", "Ž") ...; 
} 


//usage 
foreach (var oldObj in oldDB) 
{ 
    NewObject newObj = new NewObject(); 
    newObj.Name = DoGeneralReplace(oldObj.Name); 
    ... 
} 
+0

感謝您的回答。請看看[基準測試](http://stackoverflow.com/questions/11899668/replacing-multiple-characters-in-a-string-the-fastest-way/11900932#11900932)我發佈了另一個答案。 – 2012-08-10 11:48:52

2

好吧,我會嘗試做這樣的事情:

static readonly Dictionary<char, string> replacements = 
     new Dictionary<char, string> 
    { 
     {']',"Ć"}, {'~', "č"} // etc 
    }; 
    static readonly Regex replaceRegex; 
    static YourUtilityType() // static initializer 
    { 
     StringBuilder pattern = new StringBuilder().Append('['); 
     foreach(var key in replacements.Keys) 
      pattern.Append(Regex.Escape(key.ToString())); 
     pattern.Append(']'); 
     replaceRegex = new Regex(pattern.ToString(), RegexOptions.Compiled); 
    } 
    public static string Sanitize(string input) 
    { 
     return replaceRegex.Replace(input, match => 
     { 
      return replacements[match.Value[0]]; 
     }); 
    } 

這有一個地方,以保持(在頂部),並構建一個預編譯的Regex來處理替換。所有的開銷只有一個(因此static)。

+0

感謝您的回答。請看看[基準測試](http://stackoverflow.com/questions/11899668/replacing-multiple-characters-in-a-string-the-fastest-way/11900932#11900932)我發佈了另一個答案。 – 2012-08-10 11:47:56

3

你可以一個char地圖上使用lambda表達式這個使用總結:

//Method for replacing chars with a mapping 
    static string Replace(string input, IDictionary<char, char> replacementMap) { 
     return replacementMap.Keys 
      .Aggregate(input, (current, oldChar) 
       => current.Replace(oldChar, replacementMap[oldChar])); 
    } 

可以按照如下運行這個命令:

private static void Main(string[] args) { 
     //Char to char map using <oldChar, newChar> 
     var charMap = new Dictionary<char, char>(); 
     charMap.Add('-', 'D'); charMap.Add('|', 'P'); charMap.Add('@', 'A'); 

     //Your input string 
     string myString = "[email protected]||[email protected]|[email protected]"; 

     //Your own replacement method 
     myString = Replace(myString, charMap); 

     //out: myString = "asgjkDDAdfsgPPjshdDDfAjgsldDkjPrhgunfhDADnsdflngs" 
    } 
+0

感謝您的回答。請看看[基準測試](http://stackoverflow.com/questions/11899668/replacing-multiple-characters-in-a-string-the-fastest-way/11900932#11900932)我發佈了另一個答案。 – 2012-08-10 11:48:15

13

以最快的方式

唯一的方法是自己比較表現。嘗試在Q中,使用StringBuilderRegex.Replace

但微基準不考慮整個系統的範圍。如果這種方法只是整個系統的一小部分,其性能可能與整體應用程序的性能無關。

一些注意事項:

  1. 使用String如上(我認爲)將創造大量的中間字符串:對於GC更多的工作。但很簡單。
  2. 使用StringBuilder可以在每次替換時修改相同的基礎數據。這創造了更少的垃圾。這幾乎與使用String一樣簡單。
  3. 使用regex是最複雜的(因爲您需要有代碼來計算替換),但允許使用單個表達式。我希望這會更慢,除非替換列表非常大,並且在輸入字符串中替換很少(即大多數替換方法調用都不會替換,只需花費搜索字符串)。

由於GC負載較少,我預計#2在重複使用(數千次)時會稍快一些。

對於正則表達式的方法,你需要這樣的東西:

newObj.Name = Regex.Replace(oldObj.Name.Trim(), @"[@^\[\]`}~{\\]", match => { 
    switch (match.Value) { 
    case "^": return "Č"; 
    case "@": return "Ž"; 
    case "[": return "Š"; 
    case "]": return "Ć"; 
    case "`": return "ž"; 
    case "}": return "ć"; 
    case "~": return "č"; 
    case "{": return "š"; 
    case "\\": return "Đ"; 
    default: throw new Exception("Unexpected match!"); 
    } 
}); 

這可以在一個可重用的方式通過了Dictionary<char,char>參數化持有的替代和可重複使用的MatchEvaluator來完成。

+0

感謝您的回答。請看看[基準測試](http://stackoverflow.com/questions/11899668/replacing-multiple-characters-in-a-string-the-fastest-way/11900932#11900932)我發佈了另一個答案。 – 2012-08-10 11:47:34

+0

@DejanJanjušević哎呀在正則表達式錯字...我知道我需要一個字符類(糾正)。 – Richard 2012-08-10 14:13:02

+0

但是,當我修正了錯字時,結果很差......甚至比Marc的靜態正則表達式還要慢。 – 2012-08-10 18:46:42

相關問題