我正在計算文章中的單詞,並刪除諸如「和」或「the」等常見單詞。 我「米使用的preg_replacePHP額外空白不被刪除
的刪除它們它完成後,我用做額外的空白區域快速清潔。
$search_body = preg_replace('/\s+/',' ',$search_body);
但是我有一些非常固執的空白,將不會走開。我已經試過
if($word == "" OR $word == " "){
//chop it's head off
}
但如果語句看不到$字爲只是空格。我也試着將它打印到屏幕,以獲取它的原始數據類型,它仍然只是顯示空白。
這是我正在使用的完整正則表達式。
$pattern = array(
'/\"\;/',
'/[0-9]/',
'/\,/',
'/\./',
'/\!/',
'/\@/',
'/\#/',
'/\$/',
'/\%/',
'/\^/',
'/\&/',
'/\*/',
'/\(/',
'/\)/',
'/\_/',
'/\"/',
'/\'/',
'/\:/',
'/\;/',
'/\?/',
'/\`/',
'/\~/',
'/\[/',
'/\]/',
'/\{/',
'/\}/',
'/\|/',
'/\+/',
'/\=/',
'/\-/',
'/–/',
'/°/',
'/\bthe\b/',
'/\band\b/',
'/\bthat\b/',
'/\bhave\b/',
'/\bfor\b/',
'/\bnot\b/',
'/\bwith\b/',
'/\byou\b/',
'/\bthis\b/',
'/\bbut\b/',
'/\bhis\b/',
'/\bfrom\b/',
'/\bthey\b/',
'/\bsay\b/',
'/\bher\b/',
'/\bshe\b/',
'/\bwill\b/',
'/\bone\b/',
'/\ball\b/',
'/\bwould\b/',
'/\bthere\b/',
'/\btheir\b/',
'/\bwhat\b/',
'/\bout\b/',
'/\babout\b/',
'/\bwho\b/',
'/\bget\b/',
'/\bwhich\b/',
'/\bwhen\b/',
'/\bmake\b/',
'/\bcan\b/',
'/\blike\b/',
'/\btime\b/',
'/\bjust\b/',
'/\bhim\b/',
'/\bknow\b/',
'/\btake\b/',
'/\bpeople\b/',
'/\binto\b/',
'/\byear\b/',
'/\byour\b/',
'/\bgood\b/',
'/\bsome\b/',
'/\bcould\b/',
'/\bthem\b/',
'/\bsee\b/',
'/\bother\b/',
'/\bthan\b/',
'/\bthen\b/',
'/\bnow\b/',
'/\blook\b/',
'/\bonly\b/',
'/\bcome\b/',
'/\bits\b/', //it's?
'/\bover\b/',
'/\bthink\b/',
'/\balso\b/',
'/\bback\b/',
'/\bafter\b/',
'/\buse\b/',
'/\btwo\b/',
'/\bhow\b/',
'/\bour\b/',
'/\bwork\b/',
'/\bfirst\b/',
'/\bwell\b/',
'/\bway\b/',
'/\beven\b/',
'/\bnew\b/',
'/\bwant\b/',
'/\bbecause\b/',
'/\bany\b/',
'/\bthese\b/',
'/\bgive\b/',
'/\bday\b/',
'/\bmost\b/',
'/\bare\b/',
'/\bwas\b/',
'/\<\w+\>/', '/\<\/\w+\>/',
'/\b\w{1}\b/', //1 letter word
'/\b\w{2}\b/', //2 letter word
'/\//',
'/\</',
'/\>/'
);
$search_body = strip_tags($body);
$search_body = strtolower($search_body);
$search_body = preg_replace($pattern, ' ', $search_body);
$search_body = preg_replace('/\s+/',' ',$search_body);
$search_body = explode(" ", $search_body);
爆炸時的空白值顯示出來,我現在用的就是太長,張貼在這裏左右
示例文本。但是我複製並粘貼了 This article來給它一個測試,它顯示了32個空白的計數,即使在使用trim()之後也不包括其他單詞前面或後面的空白。
Here's a js.fiddle of the raw data that is being handled by php.
ヶ輛並用htmlspecialchars也什麼都不顯示。
這裏的代碼計算所有的值並將它們放入一個。
$inhere = array();
$body_hold = array();
foreach($search_body as $value){
$value = trim($value);
if(in_array($value, $inhere) && $value != ""){
$key = array_search($value, $inhere);
$body_hold[$key]['count'] = $body_hold[$key]['count']+1;
}elseif($value != ""){
$inhere[] = $value;
$body_hold[] = array(
'count' => 1,
'word' => $value
);
}
}
rsort($body_hold);
基本的foreach看到的值。
foreach($body_hold as $value){
$count = $value['count'];
$word = trim($value['word']);
echo "Count: ".$count;
echo " Word: ".$word;
echo '<br>';
}
Here's a PHP example of what it's returning
難道這些實際上是'\ r \ n'例如? – Ohgodwhy 2014-09-18 16:24:28
那是預期的?所有的替代品將在爆炸? – exussum 2014-09-18 16:31:18
@exussum正確,但我的問題是,爲什麼沒有正則表達式捕獲並刪除它,爲什麼我的if else語句無法捕獲它呢? – 2014-09-18 16:33:18