2012-03-26 63 views
2

嗨,我正在寫一個PHP類來實現Rabin-Karp算法。我有重新哈希部分的問題。此代碼不包含匹配部分字符。由於再次哈希的問題,我不得不停止匹配哈希碼。有人請幫我弄明白。用PHP實現Rabin-Karp算法

<?php 
class RabinKarp 
{ 
    /** 
    * @var String 
    */ 
    private $pattern ; 
    private $patternHash ; 
    private $text ; 
    private $previousHash ; 

    /** 
    * @var Integer 
    */ 
    private $radix ; 
    private $prime ; 
    private $position ; 

    /** 
    * Constructor 
    * 
    * @param String $pattern - The pattern 
    * 
    */ 
    public function __construct($pattern) 
    { 
     $this->pattern = $pattern; 
     $this->radix = 256; 
     $this->prime = 100007; 
     $this->previousHash = ""; 
     $this->position = 0; 
     $this->patternHash = $this->generateHash($pattern); 
    } 

    private function generateHash($key) 
    { 
     $charArray = str_split($key); 
     $hash = 0; 
     foreach($charArray as $char) 
     { 
      $hash = ($this->radix * $hash + ord($char)) % $this->prime; 
     } 

     return $hash; 
    } 

    public function search($character) 
    { 
     $this->text .= $character; 
     if(strlen($this->text) < strlen($this->pattern)) 
     { 
      return false; 
     } 
     else 
     { 
      $txtHash = 0; 
      echo $this->previousHash . "<br/>"; 
      if(empty($this->previousHash)) 
      { 
       $txtHash = $this->generateHash($this->text); 
       $this->previousHash = $txtHash; 
       $this->position = 0; 
      } 
      else 
      { 
       // The issue is here 
       $charArray = str_split($this->text); 
       $txtHash = (($txtHash + $this->prime) - $this->radix * strlen($this->pattern) * ord($charArray[$this->position]) % $this->prime) % $this->prime; 
       $txtHash = ($txtHash * $this->radix + ord($character)) % $this->prime; 

       $this->previousHash = $txtHash; 
      } 

      if($txtHash == $this->patternHash) 
      { 
       echo "Hash Match found"; 
      } 
     } 

    } 

} 

$x = new RabinKarp("ABC"); 
$x->search("Z"); 
$x->search("A"); 
$x->search("B"); 
$x->search("C"); 
?> 

謝謝。

+0

此算法是否適用於多字節字符串值?如果是這樣,你應該使用'mb_'函數來分割長度。 – 2012-03-26 12:15:35

+0

這個算法寫入串行傳輸工作。比方說,你一次獲得一個角色。一旦我收到這個角色,我就開始搜索。上面的代碼適合我。唯一的問題是重新哈希部分。實際上重新哈希值是不正確的。 – 2012-03-26 12:21:45

+0

@PrasadRajapaksha嗨Prasad先生..請你張貼固定代碼。我正在尋找rabin karp算法實現多重模式的php。請...提前致謝 – Ayuktia 2016-06-13 17:46:59

回答

1

值由您要移除字符(c用於短促)促成了散列是

ord(c) * radix^(length(pattern)-1) 

由於特徵促成ord(c)當它首先進入匹配窗口,散列 - 因此其也對於輸入匹配窗口的length(pattern)-1個字符中的每個字符,乘以radix直到c最後離開它。

但你減去ord(c) * radix * length(pattern)

$charArray = str_split($this->text); 
$txtHash = (($txtHash + $this->prime) 
      - $this->radix * strlen($this->pattern) 
      * ord($charArray[$this->position]) % $this->prime) 
      % $this->prime; 
$txtHash = ($txtHash * $this->radix + ord($character)) % $this->prime; 

此外,在計算您使用的變量$txtHash,您已設置爲0,這應該是$this->previousHash,你必須增加的文本位置。

原則,

$charArray = str_split($this->text); 
$txtHash = (($this->previousHash + $this->prime) 
      - pow($this->radix, strlen($this->pattern)-1) 
      * ord($charArray[$this->position]) % $this->prime) 
      % $this->prime; 
$txtHash = ($txtHash * $this->radix + ord($character)) % $this->prime; 

$this->previousHash = $txtHash; 
$this->position += 1; 

是你必須做的事情。

但除非模式很短,pow($this->radix,strlen($this->pattern)-1)會溢出,所以你有一個模塊化的冪函數

function mod_pow($base,$exponent,$modulus) 
{ 
    $aux = 1; 
    while($exponent > 0) { 
     if ($exponent % 2 == 1) { 
      $aux = ($aux * $base) % $modulus; 
     } 
     $base = ($base * $base) % $modulus; 
     $exponent = $exponent/2; 
    } 
    return $aux; 
} 

更換pow($this-radix, strlen($this->pattern)-1)(這可能仍然溢出如果$modulus,即$this->prime這裏,過大)。代碼的相關行成爲

$txtHash = (($this->previousHash + $this->prime) 
      - mod_pow($this->radix, strlen($this->pattern)-1, $this->prime) 
      * ord($charArray[$this->position]) % $this->prime) 
      % $this->prime; 

然後你有一個潛在的巨大效率

$this->text .= $character; 
... 
    $charArray = str_split($this->text); 

如果字符串變長,拼接和分割可能需要大量的時間(不知道PHP如何實現字符串和字符串操作,但它們可能不是固定的時間)。您應該只保留字符串的相關部分,即在重新計算散列值後刪除第一個字符。

+0

非常感謝您的回答。但它仍然說哈希不匹配。可能是我誤解了。我很感激你是否可以用你的答案修改搜索功能。在我的例子中,ABC與模式匹配。 – 2012-03-26 15:46:20

+0

發現了另外一個錯誤,我在答案中添加了正確的代碼。 – 2012-03-26 20:56:32

+0

丹尼爾。非常感謝你的幫助。 – 2012-03-27 01:26:53