2011-04-05 59 views
6

首先,我想說我是python中的新手。我試圖爲許多單詞列表計算Levenshtein距離。直到現在,我成功地爲一對單詞編寫代碼,但我在列表中遇到了一些問題。我只是HABE文字下面一個其他像這樣兩個列表: 卡洛斯 STIV 彼得使用單詞列表計算Levenshtein距離

我想用Levenshtein距離爲類似的做法。可能somebady告訴我如何加載列表,然後使用函數來計算de distance?

我會感激!

這裏是我的代碼只是兩個字符串:

#!/usr/bin/env python 
# -*- coding=utf-8 -*- 

def lev_dist(source, target): 
    if source == target: 
     return 0 

#words = open(test_file.txt,'r').read().split(); 

    # Prepare matrix 
    slen, tlen = len(source), len(target) 
    dist = [[0 for i in range(tlen+1)] for x in range(slen+1)] 
    for i in xrange(slen+1): 
     dist[i][0] = i 
    for j in xrange(tlen+1): 
     dist[0][j] = j 

    # Counting distance 
    for i in xrange(slen): 
     for j in xrange(tlen): 
      cost = 0 if source[i] == target[j] else 1 
      dist[i+1][j+1] = min(
          dist[i][j+1] + 1, # deletion 
          dist[i+1][j] + 1, # insertion 
          dist[i][j] + cost # substitution 
         ) 
    return dist[-1][-1] 

if __name__ == '__main__': 
    import sys 
    if len(sys.argv) != 3: 
     print 'Usage: You have to enter a source_word and a target_word' 
     sys.exit(-1) 
    source, target = sys.argv[1], sys.argv[2] 
    print lev_dist(source, target) 
+1

那你想幹什麼?計算列表中每對的距離? – 2011-04-05 11:29:41

+1

第1步。添加代碼以閱讀您的列表(或者是兩個列表?)。第2步。添加一個循環遍歷你的列表(或者它是兩個列表?)。第3步。發佈新代碼,以便我們對此發表評論。你發佈的代碼很好,但你也需要寫下兩個部分。 – 2011-04-05 11:30:19

+0

Thans爲快速解答。 Larsmans:我想計算每個單詞從一個列表到第二個列表中的每個單詞的距離。 S.Lott:有兩個列表! – 2011-04-05 12:18:45

回答

7

我終於得到了代碼從一個朋友:) 一些幫助工作您可以計算Levenshtein距離,並將其與第二個列表中每個更改腳本最後一行的單詞(即:print(list1 [0],list2 [i]))進行比較,以將list1中的第一個單詞與每個單詞進行比較在list2中。

感謝

#!/usr/bin/env python 
# -*- coding=utf-8 -*- 

import codecs 

def lev_dist(source, target): 
    if source == target: 
     return 0 


    # Prepare a matrix 
    slen, tlen = len(source), len(target) 
    dist = [[0 for i in range(tlen+1)] for x in range(slen+1)] 
    for i in xrange(slen+1): 
     dist[i][0] = i 
    for j in xrange(tlen+1): 
     dist[0][j] = j 

    # Counting distance, here is my function 
    for i in xrange(slen): 
     for j in xrange(tlen): 
      cost = 0 if source[i] == target[j] else 1 
      dist[i+1][j+1] = min(
          dist[i][j+1] + 1, # deletion 
          dist[i+1][j] + 1, # insertion 
          dist[i][j] + cost # substitution 
         ) 
    return dist[-1][-1] 

# load words from a file into a list 
def loadWords(file): 
    list = [] # create an empty list to hold the file contents 
    file_contents = codecs.open(file, "r", "utf-8") # open the file 
    for line in file_contents: # loop over the lines in the file 
     line = line.strip() # strip the line breaks and any extra spaces 
     list.append(line) # append the word to the list 
    return list 

if __name__ == '__main__': 
    import sys 
    if len(sys.argv) != 3: 
     print 'Usage: You have to enter a source_word and a target_word' 
     sys.exit(-1) 
    source, target = sys.argv[1], sys.argv[2] 

    # create two lists, one of each file by calling the loadWords() function on the file 
    list1 = loadWords(source) 
    list2 = loadWords(target) 

    # now you have two lists; each file has to have the words you are comparing on the same lines 
    # now call you lev_distance function on each pair from those lists 

    for i in range(0, len(list1)): # so now you are looping over a range of numbers, not lines 
     print lev_dist(list1[0], list2[i]) 


# print lev_dist(source, target) 
5
+7

有時候很好的建議 - 但它也是瞭解車輪如何工作的最好方法... – grifaton 2011-04-05 12:09:24

+2

是的,我知道模塊,我只是想自己學習python! – 2011-04-05 12:26:09

+6

這不適用於列表! – ingrid 2016-11-14 22:24:40