我有以下代碼,但是。它太慢PHP優化性能
<?php
class Ngram {
const SAMPLE_DIRECTORY = "samples/";
const GENERATED_DIRECTORY = "languages/";
const SOURCE_EXTENSION = ".txt";
const GENERATED_EXTENSION = ".lng";
const N_GRAM_MIN_LENGTH = "1";
const N_GRAM_MAX_LENGTH = "6";
public function __construct() {
mb_internal_encoding('UTF-8');
$this->generateNGram();
}
private function getFilePath() {
$files = array();
$excludes = array('.', '..');
$path = rtrim(self::SAMPLE_DIRECTORY, DIRECTORY_SEPARATOR . '/');
$files = scandir($path);
$files = array_diff($files, $excludes);
foreach ($files as $file) {
if (is_dir($path . DIRECTORY_SEPARATOR . $file))
fetchdir($path . DIRECTORY_SEPARATOR . $file, $callback);
else if (!preg_match('/^.*\\' . self::SOURCE_EXTENSION . '$/', $file))
continue;
else
$filesPath[] = $path . DIRECTORY_SEPARATOR . $file;
}
unset($file);
return $filesPath;
}
protected function removeUniCharCategories($string){
//Replace punctuation(' " # % & ! . : , ? ¿) become space " "
//Example : 'You&me', become 'You Me'.
$string = preg_replace("/\p{Po}/u", " ", $string);
//--------------------------------------------------
$string = preg_replace("/[^\p{Ll}|\p{Lm}|\p{Lo}|\p{Lt}|\p{Lu}|\p{Zs}]/u", "", $string);
$string = trim($string);
$string = mb_strtolower($string,'UTF-8');
return $string;
}
private function generateNGram() {
$files = $this->getFilePath();
foreach($files as $file) {
$file_content = file_get_contents($file, FILE_TEXT);
$file_content = $this->removeUniCharCategories($file_content);
$words = explode(" ", $file_content);
$tokens = array();
foreach ($words as $word) {
$word = "_" . $word . "_";
$length = mb_strlen($word, 'UTF-8');
for ($i = self::N_GRAM_MIN_LENGTH, $min = min(self::N_GRAM_MAX_LENGTH, $length); $i <= $min; $i++) {
for ($j = 0, $li = $length - $i; $j <= $li; $j++) {
$token = mb_substr($word, $j, $i, 'UTF-8');
if (trim($token, "_")) {
$tokens[] = $token;
}
}
}
}
unset($word);
$tokens = array_count_values($tokens);
arsort($tokens);
$ngrams = array_slice(array_keys($tokens), 0);
file_put_contents(self::GENERATED_DIRECTORY . str_replace(self::SOURCE_EXTENSION, self::GENERATED_EXTENSION, basename($file)), implode(PHP_EOL, $ngrams));
}
unset($file);
}
}
$ii = new Ngram();
?>
如何使它快速? 謝謝
[代碼審查(http://codereview.stackexchange.com/)可能是更好的地方張貼了這個問題... – Xaerxess
謝謝:)對於錯過的地方感到抱歉 – Ahmad