2010-12-07 52 views
8

我遇到過在OCR識別的文本中匹配字符串的問題,並找到它的位置,考慮到可能會出現任意錯誤,缺失或額外的容錯字符。結果應該是最佳匹配位置,可能(不一定)匹配子字符串的長度。如何在模糊匹配的字符串中找到子字符串的位置

例如:

String: 9912, 1.What is your name? 
Substring: 1. What is your name? 
Tolerance: 1 
Result: match on character 7 

String: Where is our caat if any? 
Substring: your cat 
Tolerance: 2 
Result: match on character 10 

String: Tolerance is t0o h1gh. 
Substring: Tolerance is too high; 
Tolerance: 1 
Result: no match 

我試圖適應萊文施泰因的算法,但它並不適用於子正常工作,不回位。

Delphi中的算法將是首選,但任何實現或僞邏輯都可以。

回答

8

下面是一個有效的遞歸實現,但可能不夠快。最糟糕的情況是無法找到匹配項,並且「What」中的最後一個字符在Where中的每個索引處都匹配。在這種情況下,該算法將對Where中的每個字符進行Length(What)-1 + Tolerance比較,以及每個Tolerance的一個遞歸調用。既然公差和什麼是常量的長度,我會說算法是O(n)。它的性能會隨着「What」和「Where」的長度而線性降低。

function BrouteFindFirst(What, Where:string; Tolerance:Integer; out AtIndex, OfLength:Integer):Boolean; 
    var i:Integer; 
     aLen:Integer; 
     WhatLen, WhereLen:Integer; 

    function BrouteCompare(wherePos, whatPos, Tolerance:Integer; out Len:Integer):Boolean; 
    var aLen:Integer; 
     aRecursiveLen:Integer; 
    begin 
     // Skip perfect match characters 
     aLen := 0; 
     while (whatPos <= WhatLen) and (wherePos <= WhereLen) and (What[whatPos] = Where[wherePos]) do 
     begin 
     Inc(aLen); 
     Inc(wherePos); 
     Inc(whatPos); 
     end; 
     // Did we find a match? 
     if (whatPos > WhatLen) then 
     begin 
      Result := True; 
      Len := aLen; 
     end 
     else if Tolerance = 0 then 
     Result := False // No match and no more "wild cards" 
     else 
     begin 
      // We'll make an recursive call to BrouteCompare, allowing for some tolerance in the string 
      // matching algorithm. 
      Dec(Tolerance); // use up one "wildcard" 
      Inc(whatPos); // consider the current char matched 
      if BrouteCompare(wherePos, whatPos, Tolerance, aRecursiveLen) then 
      begin 
       Len := aLen + aRecursiveLen; 
       Result := True; 
      end 
      else if BrouteCompare(wherePos + 1, whatPos, Tolerance, aRecursiveLen) then 
      begin 
       Len := aLen + aRecursiveLen; 
       Result := True; 
      end 
      else 
      Result := False; // no luck! 
     end; 
    end; 

    begin 

    WhatLen := Length(What); 
    WhereLen := Length(Where); 

    for i:=1 to Length(Where) do 
    begin 
     if BrouteCompare(i, 1, Tolerance, aLen) then 
     begin 
     AtIndex := i; 
     OfLength := aLen; 
     Result := True; 
     Exit; 
     end; 
    end; 

    // No match found! 
    Result := False; 

    end; 

我用下面的代碼來測試功能:

procedure TForm18.Button1Click(Sender: TObject); 
var AtIndex, OfLength:Integer; 
begin 
    if BrouteFindFirst(Edit2.Text, Edit1.Text, ComboBox1.ItemIndex, AtIndex, OfLength) then 
    Label3.Caption := 'Found @' + IntToStr(AtIndex) + ', of length ' + IntToStr(OfLength) 
    else 
    Label3.Caption := 'Not found'; 
end; 

對於情況:

​​

它示出了字符9匹配,長度爲6的對於其他兩個例子給出了預期的結果。

+0

您的解決方案正是我所尋找的,謝謝。 – too 2010-12-07 12:42:40

相關問題