2014-02-16 70 views
0

我需要使用遞歸找到兩個字符串的編輯距離,即我給函數兩個參數(每個都是不同的sring)。此功能將找到將s1更改爲s2所需的最少更改量。這是我到目前爲止:找到兩個字符串的遞歸編輯距離

def edit_distance(s1,s2): 
    split1 = list(s1) 
    split2 = list(s2) 
    count = 0 
    pos = 0 

    if split1[pos] == split2[pos]: 
     pos += 1 
    else: 
     pos +=1 
     count += 1 
     edit_distance(s1, s2) 

return count #This should be the minimum amount required to match the two strings 
+1

請解釋你的代碼應該做什麼,因爲我說不清,看起來非常破碎。此外,字符串距離至少有兩種常見定義,一種是隻考慮字符替換(漢明距離),另一種是考慮替換和刪除/插入的定義。 (Levenshtein距離)最後,請問一個具體問題,並解釋到底發生了什麼/沒有發生,以及您到目前爲止嘗試本地化問題。 – Nabla

+1

您是否在搜索Levenshtain算法的遞歸python實現? http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python – erthalion

+0

它應該返回將s1轉換爲s2所需的最小數量的單個字符更改。所以如果's1 ='a''和's2 = b'它應該會返回1 – Amon

回答

1

我註釋了你的代碼,以顯示你的代碼流。我希望你現在明白爲什麼你會得到這個錯誤:

def edit_distance(s1,s2): 
    split1 = list(s1) # Split strings into characters 
    split2 = list(s2) 
    count = 0   # This variable is local, it is not shared through calls to the function! 
    pos = 0    # Same 

    if split1[pos] == split2[pos]: # pos is always 0 here! 
     pos += 1      # pos is incremented anyway, in if but also in else ! 
    else: 
     pos +=1      # See above 
     count += 1     # count is incremented, now it is 1 
     edit_distance(s1, s2)  # recursive call, but with the identical arguments as before! The next function call will do exactly the same as this one, resulting in infinite recursion! 

return count # Wrong indentation here 

你的功能沒有做你想做的。如果你正在談論漢明距離,這仍然不是很清楚,假設兩個字符串的長度都相等,這裏是一個示例實現:

# Notice that pos is passed between calls and initially set to zero 
def hamming(s1, s2, pos=0):  
    # Are we after the last character already? 
    if pos < len(s1): 
     # Return one if the current position differs and add the result for the following positions (starting at pos+1) to that 
     return (s1[pos] != s2[pos]) + hamming(s1, s2, pos+1) 
    else: 
     # If the end is already reached, the remaining distance is 0 
     return 0