Levenshtein Python中的距離循環

我有一組參考詞（拼寫正確），我需要輸入一個用戶輸入詞。使用levenshtein距離將輸入詞與參考列表進行比較，我需要從參考列表中返回具有最低成本的詞。此外，該參考列表按頻率排序，因此較高的頻率出現在頂部。如果2個字的距離相同，則返回頻率更高的字。「NWORDS」是我根據頻率排序的參考列表。「候選人」是用戶輸入的單詞。Levenshtein Python中的距離循環

代碼：

for word in NWORDS: #iterate over all words in ref 
    i = jf.levenshtein_distance(candidate,word) #compute distance for each word with user input 

     #dont know what to do here 
    return word #function returns word from ref list with lowest dist and highest frequency of occurrence.

來源

2014-02-16 Hypothetical Ninja

編輯距離是不是解決這個問題的辦法。 http://norvig.com/spell-correct.html –

另請參閱此問題：http://stackoverflow.com/questions/2294915/what-algorithm-gives-suggestions-in-a-spell-checker – Krumelur

這只是整個問題的一部分。在此之前，我已經實施了peter norvig，但是我需要使用levenshtein提高效率。而且我的數據包含很少的英文單詞。 –

你可以接近這個如下：

match = None # best match word so far 
dist = None # best match distance so far 
for word in NWORDS: #iterate over all words in ref 
    i = jf.levenshtein_distance(candidate, word) #compute distance for each word with user input 
    if dist is None or i < dist: # or <= if lowest freq. first in NWORDS 
     match, dist = word, i 
return match #function returns word from ref list with lowest dist and highest frequency of occurrence

來源

2014-02-16 10:07:58 jonrsharpe

Levenshtein Python中的距離循環

回答

相關問題