PHP優化性能

-1

我有以下代碼，但是。它太慢PHP優化性能

<?php 
class Ngram { 

const SAMPLE_DIRECTORY = "samples/"; 
const GENERATED_DIRECTORY = "languages/"; 
const SOURCE_EXTENSION = ".txt"; 
const GENERATED_EXTENSION = ".lng"; 
const N_GRAM_MIN_LENGTH = "1"; 
const N_GRAM_MAX_LENGTH = "6"; 

public function __construct() { 
    mb_internal_encoding('UTF-8'); 
    $this->generateNGram(); 
} 

private function getFilePath() { 
    $files = array(); 
    $excludes = array('.', '..'); 
    $path = rtrim(self::SAMPLE_DIRECTORY, DIRECTORY_SEPARATOR . '/'); 
    $files = scandir($path); 
    $files = array_diff($files, $excludes); 
    foreach ($files as $file) { 

     if (is_dir($path . DIRECTORY_SEPARATOR . $file)) 
      fetchdir($path . DIRECTORY_SEPARATOR . $file, $callback); 
     else if (!preg_match('/^.*\\' . self::SOURCE_EXTENSION . '$/', $file)) 
      continue; 
     else 
      $filesPath[] = $path . DIRECTORY_SEPARATOR . $file; 
    } 
    unset($file); 
    return $filesPath; 
} 
protected function removeUniCharCategories($string){ 
    //Replace punctuation(' " # % & ! . : , ? ¿) become space " " 
    //Example : 'You&me', become 'You Me'. 
    $string = preg_replace("/\p{Po}/u", " ", $string); 
    //-------------------------------------------------- 
    $string = preg_replace("/[^\p{Ll}|\p{Lm}|\p{Lo}|\p{Lt}|\p{Lu}|\p{Zs}]/u", "", $string); 
    $string = trim($string); 
    $string = mb_strtolower($string,'UTF-8'); 
    return $string; 
} 
private function generateNGram() { 
    $files = $this->getFilePath(); 
    foreach($files as $file) { 
     $file_content = file_get_contents($file, FILE_TEXT); 
     $file_content = $this->removeUniCharCategories($file_content); 
     $words = explode(" ", $file_content); 
     $tokens = array(); 
     foreach ($words as $word) { 
      $word = "_" . $word . "_"; 
      $length = mb_strlen($word, 'UTF-8'); 
      for ($i = self::N_GRAM_MIN_LENGTH, $min = min(self::N_GRAM_MAX_LENGTH, $length); $i <= $min; $i++) { 
       for ($j = 0, $li = $length - $i; $j <= $li; $j++) { 
        $token = mb_substr($word, $j, $i, 'UTF-8'); 
        if (trim($token, "_")) { 
         $tokens[] = $token; 
        } 
       } 
      } 
     } 
     unset($word); 
     $tokens = array_count_values($tokens); 
     arsort($tokens); 
     $ngrams = array_slice(array_keys($tokens), 0); 
     file_put_contents(self::GENERATED_DIRECTORY . str_replace(self::SOURCE_EXTENSION, self::GENERATED_EXTENSION, basename($file)), implode(PHP_EOL, $ngrams)); 
    } 
    unset($file); 
} 
} 
$ii = new Ngram(); 
?>

如何使它快速？謝謝

來源

2011-07-01 Ahmad

[代碼審查（http://codereview.stackexchange.com/）可能是更好的地方張貼了這個問題... – Xaerxess

謝謝:)對於錯過的地方感到抱歉 – Ahmad

-1

PHP的foreach {}比{}慢得多（最多16次）。嘗試替換generateNGram（）函數中的thoses。

另外，你可以將你的代碼從generateNGram（）函數複製到你的構造函數中。它將防止對功能的無用呼叫。

來源

2011-07-01 13:39:53

「PHP的foreach {}比{}需要更慢（最多16次）;}」另外，你可以將你的代碼從generateNGram（）函數複製到你的構造函數中。將阻止一個無用的函數調用。「可以忽略不計，但在構造函數中有太多東西是一個非常不好的習慣 – KingCrunch

我同意構造函數的事情，但據我所知，foreach並不是一件好事，除非你想在保持ID的同時獲取多維數組數組： foreach（$ this as $ id => $ that）{} –

好的，因爲你沒有，我搜索了自己並找到了一些東西，這與你宣傳的東西完全相反：http://www.phpbench.com /（需要向下滾動一點）。 – KingCrunch

快速搜索'如何剖析php'谷歌導致這個stackoverflow問題：Simplest way to profile a PHP script這提供了一個非常簡短的答案你的問題。

何況所有的，但你可以在這裏找到有用的信息： http://www.php.net/apd http://www.xdebug.org/docs/profiler

來源

2011-07-01 13:41:52 illEatYourPuppies

您不能指望我們分析您的代碼，特別是當這段時間過長且您沒有附加任何描述時。 – illEatYourPuppies

回答

相關問題