2011-07-01 195 views
-1

我有以下代碼,但是。它太慢PHP優化性能

<?php 
class Ngram { 

const SAMPLE_DIRECTORY = "samples/"; 
const GENERATED_DIRECTORY = "languages/"; 
const SOURCE_EXTENSION = ".txt"; 
const GENERATED_EXTENSION = ".lng"; 
const N_GRAM_MIN_LENGTH = "1"; 
const N_GRAM_MAX_LENGTH = "6"; 

public function __construct() { 
    mb_internal_encoding('UTF-8'); 
    $this->generateNGram(); 
} 

private function getFilePath() { 
    $files = array(); 
    $excludes = array('.', '..'); 
    $path = rtrim(self::SAMPLE_DIRECTORY, DIRECTORY_SEPARATOR . '/'); 
    $files = scandir($path); 
    $files = array_diff($files, $excludes); 
    foreach ($files as $file) { 

     if (is_dir($path . DIRECTORY_SEPARATOR . $file)) 
      fetchdir($path . DIRECTORY_SEPARATOR . $file, $callback); 
     else if (!preg_match('/^.*\\' . self::SOURCE_EXTENSION . '$/', $file)) 
      continue; 
     else 
      $filesPath[] = $path . DIRECTORY_SEPARATOR . $file; 
    } 
    unset($file); 
    return $filesPath; 
} 
protected function removeUniCharCategories($string){ 
    //Replace punctuation(' " # % & ! . : , ? ¿) become space " " 
    //Example : 'You&me', become 'You Me'. 
    $string = preg_replace("/\p{Po}/u", " ", $string); 
    //-------------------------------------------------- 
    $string = preg_replace("/[^\p{Ll}|\p{Lm}|\p{Lo}|\p{Lt}|\p{Lu}|\p{Zs}]/u", "", $string); 
    $string = trim($string); 
    $string = mb_strtolower($string,'UTF-8'); 
    return $string; 
} 
private function generateNGram() { 
    $files = $this->getFilePath(); 
    foreach($files as $file) { 
     $file_content = file_get_contents($file, FILE_TEXT); 
     $file_content = $this->removeUniCharCategories($file_content); 
     $words = explode(" ", $file_content); 
     $tokens = array(); 
     foreach ($words as $word) { 
      $word = "_" . $word . "_"; 
      $length = mb_strlen($word, 'UTF-8'); 
      for ($i = self::N_GRAM_MIN_LENGTH, $min = min(self::N_GRAM_MAX_LENGTH, $length); $i <= $min; $i++) { 
       for ($j = 0, $li = $length - $i; $j <= $li; $j++) { 
        $token = mb_substr($word, $j, $i, 'UTF-8'); 
        if (trim($token, "_")) { 
         $tokens[] = $token; 
        } 
       } 
      } 
     } 
     unset($word); 
     $tokens = array_count_values($tokens); 
     arsort($tokens); 
     $ngrams = array_slice(array_keys($tokens), 0); 
     file_put_contents(self::GENERATED_DIRECTORY . str_replace(self::SOURCE_EXTENSION, self::GENERATED_EXTENSION, basename($file)), implode(PHP_EOL, $ngrams)); 
    } 
    unset($file); 
} 
} 
$ii = new Ngram(); 
?> 

如何使它快速? 謝謝

+0

[代碼審查(http://codereview.stackexchange.com/)可能是更好的地方張貼了這個問題... – Xaerxess

+0

謝謝:)對於錯過的地方感到抱歉 – Ahmad

回答

-1

PHP的foreach {}比{}慢得多(最多16次)。嘗試替換generateNGram()函數中的thoses。

另外,你可以將你的代碼從generateNGram()函數複製到你的構造函數中。它將防止對功能的無用呼叫。

+0

「PHP的foreach {}比{}需要更慢(最多16次);}」另外,你可以將你的代碼從generateNGram()函數複製到你的構造函數中。將阻止一個無用的函數調用。「可以忽略不計,但在構造函數中有太多東西是一個非常不好的習慣 – KingCrunch

+0

我同意構造函數的事情,但據我所知,foreach並不是一件好事,除非你想在保持ID的同時獲取多維數組數組: foreach($ this as $ id => $ that){} –

+0

好的,因爲你沒有,我搜索了自己並找到了一些東西,這與你宣傳的東西完全相反:http://www.phpbench.com /(需要向下滾動一點)。 – KingCrunch