mb_str_replace（）...很慢。任何替代品？

我想確保我正在運行的一些字符串替換是多字節安全的。我在網絡上發現了一些mb_str_replace函數，但它們很慢。在通過500-900字節之後，我會說增加20％。mb_str_replace（）...很慢。任何替代品？

有什麼建議嗎？我正在考慮使用preg_replace，因爲它是原生的並且被編譯進來，所以它可能會更快。任何想法將不勝感激。

來源

2010-08-15 onassar

你需要給更多的信息。什麼是替換字符串和主題的編碼？如果主題是UTF-8，並且替換字符串在ASCII範圍內，則可以使用'str_replace'。 – Artefacto 2010-08-15 23:46:33

Unicode已經存在了，15年了，現在呢？仍在使用核心內部循環中的mb字符串？從內到外工作。 – 2010-08-16 00:18:32

正如所說的there，只要所有參數都是utf-8有效，就可以安全地在utf-8上下文中使用str_replace，因爲這兩個多字節編碼字符串之間不會有任何模糊的匹配。如果你檢查輸入的有效性，那麼你不需要尋找一個不同的功能。

來源

2013-01-14 10:30:44

如果您使用unicode並關心[unicode equivalence]（http://en.wikipedia.org/wiki/Unicode_equivalence），這是錯誤的。在unicode中，幾個不同的字節序列可以表示相同的字符。如果你首先規範化你的兩個字符串，那麼使用'str_replace'只會** **。 – Qtax 2014-01-22 10:52:43

好的提示，無論如何，我對「多字節安全」的理解是「他們不會在匹配時給出任何錯誤肯定」，實際上意味着他們不會根據替換的期望來破壞輸出信息。 – 2014-01-24 21:12:04

查詢提供的鏈接 – Trix 2017-05-15 06:27:28

由於編碼是來自任何地方（utf8或其他）的輸入的真正挑戰，我更喜歡只使用多字節安全函數。對於str_replace，我使用的是this one，它足夠快。

if (!function_exists('mb_str_replace')) 
{ 
    function mb_str_replace($search, $replace, $subject, &$count = 0) 
    { 
     if (!is_array($subject)) 
     { 
     $searches = is_array($search) ? array_values($search) : array($search); 
     $replacements = is_array($replace) ? array_values($replace) : array($replace); 
     $replacements = array_pad($replacements, count($searches), ''); 
     foreach ($searches as $key => $search) 
     { 
      $parts = mb_split(preg_quote($search), $subject); 
      $count += count($parts) - 1; 
      $subject = implode($replacements[$key], $parts); 
     } 
     } 
     else 
     { 
     foreach ($subject as $key => $value) 
     { 
      $subject[$key] = mb_str_replace($search, $replace, $value, $count); 
     } 
     } 
     return $subject; 
    } 
}

來源

2014-04-23 14:51:54

這裏是我的實現，基於斷Alain's answer：

/** 
* Replace all occurrences of the search string with the replacement string. Multibyte safe. 
* 
* @param string|array $search The value being searched for, otherwise known as the needle. An array may be used to designate multiple needles. 
* @param string|array $replace The replacement value that replaces found search values. An array may be used to designate multiple replacements. 
* @param string|array $subject The string or array being searched and replaced on, otherwise known as the haystack. 
*        If subject is an array, then the search and replace is performed with every entry of subject, and the return value is an array as well. 
* @param string $encoding The encoding parameter is the character encoding. If it is omitted, the internal character encoding value will be used. 
* @param int $count If passed, this will be set to the number of replacements performed. 
* @return array|string 
*/ 
public static function mbReplace($search, $replace, $subject, $encoding = 'auto', &$count=0) { 
    if(!is_array($subject)) { 
     $searches = is_array($search) ? array_values($search) : [$search]; 
     $replacements = is_array($replace) ? array_values($replace) : [$replace]; 
     $replacements = array_pad($replacements, count($searches), ''); 
     foreach($searches as $key => $search) { 
      $replace = $replacements[$key]; 
      $search_len = mb_strlen($search, $encoding); 

      $sb = []; 
      while(($offset = mb_strpos($subject, $search, 0, $encoding)) !== false) { 
       $sb[] = mb_substr($subject, 0, $offset, $encoding); 
       $subject = mb_substr($subject, $offset + $search_len, null, $encoding); 
       ++$count; 
      } 
      $sb[] = $subject; 
      $subject = implode($replace, $sb); 
     } 
    } else { 
     foreach($subject as $key => $value) { 
      $subject[$key] = self::mbReplace($search, $replace, $value, $encoding, $count); 
     } 
    } 
    return $subject; 
}

他不接受字符編碼，但我想你可以通過mb_regex_encoding設置。

我的單元測試都通過了：

function testMbReplace() { 
    $this->assertSame('bbb',Str::mbReplace('a','b','aaa','auto',$count1)); 
    $this->assertSame(3,$count1); 
    $this->assertSame('ccc',Str::mbReplace(['a','b'],['b','c'],'aaa','auto',$count2)); 
    $this->assertSame(6,$count2); 
    $this->assertSame("\xbf\x5c\x27",Str::mbReplace("\x27","\x5c\x27","\xbf\x27",'iso-8859-1')); 
    $this->assertSame("\xbf\x27",Str::mbReplace("\x27","\x5c\x27","\xbf\x27",'gbk')); 
}

來源

2015-02-24 21:24:44 mpen

mb_str_replace（）...很慢。任何替代品？

回答

相關問題