0
我有一個函數可以將html去掉,並將這些單詞放在一個數組中,然後使用array_count_values。我試圖報告每個詞的出現次數。陣列輸出非常混亂。我試圖清理它,而且我無處可去。我想刪除電話號碼,並且由於某些原因,短語被推在一起。第一個數組似乎也是空的,但isset()或empty()似乎沒有解除它。清理字數組
$body = $this->get_response($domain);
$body = preg_replace('/<body(.*?)>/i', '<body>', $body);
$body = preg_replace('#</body>#i', '</body>', $body);
$openTag = '<body>';
$start = strpos($body, $openTag);
$start += strlen($openTag);
$closeTag = '</body>';
$end = strpos($body, $closeTag);
// Return if cannot cut-out the body
if ($end <= $start || $start === false || $end === false) {
$this->setValue('');
return;
}
$body = substr($body, $start, $end - $start);
$body = preg_replace(array(
'@<script[^>]*?>.*?</script>@si', // Strip out javascript
'@<style[^>]*?>.*?</style>@siU', // Strip style tags properly
'@<![\s\S]*?--[ \t\n\r]*>@', // Strip multi-line comments including CDATA
'/style=([\"\']??)([^\">]*?)\\1/siU',// Strip inline style attribute
), '', $body);
$body = strip_tags($body);
$body = array_filter(explode(' ', $body), create_function('$str', 'return strlen($str) > 2;'));
$body = array_map('trim', $body);
$words = $body;
$i = 0;
$words = array_count_values($words);
foreach($words as $word){
if (empty($word)) unset($words[$i]);
$i++;
}
echo "<pre>";
print_r($words);
echo "</pre>";
輸出
Array
(
[] => 28
[333.444.5555] => 1
[facebook] => 2
[twitter] => 2
[linkedin] => 2
[youtube
googleplus] => 1
[About
History
Our] => 1
[Mission
Who] => 1
[This
That
Other] => 1
[Us
English
FA
Football] => 1
[Media
Pay] => 2
[Per] => 4
[Think
Fast] => 2
[Marketing
Design] => 1
[Consulting
Case] => 2
這樣做。真棒。謝謝! – madphp