str_pos和preg_match之間哪個更有效？

這個問題後： Pattern for check single occurrency into preg_match_all str_pos和preg_match之間哪個更有效？

我明白我的模式必須只包含每個週期一個字，因爲，在這個問題報道的情況下，我必須找到「Microsoft」和的「Microsoft Exchange」，我不能修改我的正則表達式，因爲這兩種可能性是從數據庫中給出的！

所以我的問題是：哪個是超過200 preg_match和相同數量的str_pos之間的更好的解決方案，以檢查char的子集是否包含這些單詞？

我試圖寫這兩個解決方案可能的代碼：

$array= array(200+ values); 
foreach ($array as $word) 
{ 
    $pattern='<\b(?:'.$word.')\b>i'; 
    preg_match_all($pattern, $text, $matches); 
    $fields['skill'][] = $matches[0][0]; 
}

另一種方法是：根據

$array= array(200+ values); 
foreach ($array as $word) 
{ 
    if(str_pos($word, $text)>-1) 
    { 
    fields['skill'][] = $word; 
    } 
}

來源

2017-03-16 Filippo1980

基於REGEX的函數比大多數其他字符串函數更慢。順便說一下，如果你像'$ pattern ='<\ b（？：'。$ word1。'|'。$ word2。'|'。$ word3。'|''那樣做你的測試也可以用一個正則表達式來完成。。$ word4。'）\ b> i';'一次可以使用多少個單詞取決於正則表達式可以使用多長時間。我創建了12004個字符長的測試正則表達式。似乎不是最大的。 – JustOnUnderMillions

'str_pos（）'通常比preg_match快3-20倍，因爲preg_match主要用於探測字符串的格式，並根據正則表達式檢索它的部分。 –

strpos比preg_match得多快，這裏是一個風向標：

$array = array(); 
for($i=0; $i<1000; $i++) $array[] = $i; 
$nbloop = 10000; 
$text = <<<EOD 
I understand that my pattern must contain only a word per cycle because, in the case reported in that question, I must find "microsoft" and "microsoft exchange" and I can't modify my regexp because these two possibilities are given dinamically from a database! 

So my question is: which is the better solution between over 200 preg_match and the same numbers of str_pos to check if a subset of char contains these words? 
EOD; 

$start = microtime(true); 
for ($i=0; $i<$nbloop; $i++) { 
    foreach ($array as $word) { 
     $pattern='<\b(?:'.$word.')\b>i'; 
     if (preg_match_all($pattern, $text, $matches)) { 
      $fields['skill'][] = $matches[0][0]; 
     } 
    } 
} 
echo "Elapse regex: ", microtime(true)-$start,"\n"; 


$start = microtime(true); 
for ($i=0; $i<$nbloop; $i++) { 
    foreach ($array as $word) { 
     if(strpos($word, $text)>-1) { 
      $fields['skill'][] = $word; 
     } 
    } 
} 
echo "Elapse strpos: ", microtime(true)-$start,"\n";

輸出：

Elapse regex: 7.9924139976501 
Elapse strpos: 0.62015008926392

這是快約13倍。

來源

2017-03-16 17:01:04 Toto

非常感謝你的回答！ – Filippo1980

正則表達式的功能slowers比大多數其他字符串函數。

通過測試也能做到這一點與一個正則表達式，如果你不喜歡它$pattern='<\b(?:'.$word1.'|'.$word2.'|'.$word3.'|'.$word4.')‌\b>i';和多少的話，你可以一次使用依賴於正則表達式可以持續多久的方式。我創建了12004個字符長的測試正則表達式。似乎不是最大的。

正則表達式版本（單電）：

$array= array(200+ values); 

$pattern='<\b(?:'.implode('|',$array).')\b>i'; 
preg_match_all($pattern, $text, $matches); 
//$fields['skill'][] = $matches[0][0];

strpos版本（多呼叫）

$array= array(200+ values); 
foreach ($array as $word){ 
if(strpos($word, $text)!==false)//not with >-1 wont work. 
{ 
    fields['skill'][] = $word; 
} 
}

如果你在尋找簡單的詞，strpos將匹配HelloWorldHello，所以如果你只想要真正的詮釋詞，你可以這樣做：

$arrayOfWords = explode(' ',$string); 
//and now you can check array aginst array 
$array= array(200+ values); 
foreach ($array as $word){ 
if(in_array($word,$arrayOfWords))//not with >-1 wont work. 
{ 
    fields['skill'][] = $word; 
} 
} 
//you can makes this also faster if you array_flip the arrayOfWords 
//and then check with 'isset' (more faster than 'in_array')

如果您的單詞列表中沒有這種組合，那麼您也希望匹配單詞組合（「microsoft exchange」）無法以此方式完成。

*添加評論

來源

2017-03-16 16:31:28 JustOnUnderMillions

謝謝你的回答，但你的正則表達式有問題...正如我所說，如果我在同一短語中尋找「Microsoft」和「microsoft exchange」，您的解決方案將只能找到一個結果！ – Filippo1980

@ Filippo1980確定，但只有當您單獨查找「microsoft exchange」而不是「microsoft」時，檢查的答案纔會得到「microsoft exchange」，我的回答更多地指向_what is faster_。而_my regexp_與_your regexp_相同，只是一次尋找多個單詞;-)而你的問題實際上關乎性能而不是你想要的結果。 - > _所以我的問題是：..._ – JustOnUnderMillions

str_pos和preg_match之間哪個更有效？

回答

相關問題