比較兩個字符串並返回最相似的一個

我必須編寫一個函數，它將一個字符串作爲參數，並將此字符串比較爲兩個其他字符串，並返回最相似的字符串和差異數。比較兩個字符串並返回最相似的一個

def func("LUMB"): 
    lst=["JIBM", "NUNE", "NUMB"] 
should return: 
("NUMB",1)

我曾嘗試：

def f(word): 
    lst=["JIBM", "NUNE", "NUMB"] 
    for i in lst: 
     d=k(word, lst) 
     return differences 
     for n in d: 
      print min(sum(n))

其中：

def k(word1, word2): 
    L=[] 
    for w in range(len(word1)): 
     if word1[w] != word2[w]: 
      L.append(1) 
     else: 
      L.append(0) 
    return L

，使我得到如列表，[1,0,0,0]如果字1 =「NUMB 「和word2 =」LUMB「

來源

2011-12-15 Linus Svendsson

你見過[Text difference algorithm]（http://stackoverflow.com/questions/145607/text-difference-algorithm）和[用於模糊字符串比較的好Python模塊]（http://stackoverflow.com/questions）/682367 /好-python-modules-for-fuzzy-string-comparison） – Chris 2011-12-15 11:21:26

很多答案都可以在這個鏈接上獲得http://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy -string-comparison – 2011-12-15 11:32:09

看起來Shawn Chin提供了最好的解決方案，但是如果您阻止使用非內置模塊，則看起來好像get_close_matches從difflib可能幫助：

import difflib 
difflib.get_close_matches("LUMB", ["JIBM", "NUNE", "NUMB"], 1)

的差異的數目可以使用SequenceMatcher的get_opcodes方法，並用它的返回值來工作得到。

來源

2011-12-15 11:20:31

使用pylevenshtein計算Levenshtein distance：

>>> from Levenshtein import distance 
>>> from operator import itemgetter 
>>> lst = ["JIBM", "NUNE", "NUMB"] 
>>> min([(x, distance("LUMB", x)) for x in lst], key=itemgetter(1)) 
('NUMB', 1)

，或作爲功能：

from Levenshtein import distance 
from operator import itemgetter 
def closest(word, lst): 
    return min([(x, distance(word, x)) for x in lst], key=itemgetter(1)) 

print closest("NUMB", ["JIBM", "NUNE", "NUMB"])

附：如果你想避免額外的依賴，你可以實現自己的函數來計算距離。例如，在wikibooks中提出了幾個版本，每個版本都有自己的優缺點。

但是，如果性能是一個問題，請考慮堅持定製模塊。除了pylevenshtein，還有python-levenshtein和nltk.metrics.distance（如果您碰巧已經使用NLTK）。

來源

2011-12-15 11:26:40

比較兩個字符串並返回最相似的一個

回答

相關問題