在字符串中匹配案例詞python

我正在嘗試編寫一個函數來檢查一個字是否在字符串中，或者這個字與字符串中的每個字都有相同的字符len(word)-1。在字符串中匹配案例詞python

例如：

word: match string: There is a match -> True 
word: matck string: There is a match -> True

輸出需要爲這兩個例子是真實的，因爲matck-1=matc和match-1=matc

到目前爲止我寫了下面的代碼：

for idx, f in enumerate(files): 
    for word in words: 
     if term in f: 
      numOfWord[idx] += 1 
     else: 
      file_words = f.split() 
      for f_word in file_words: 
       if word[:-1] == file_word[:-1]: 
        numOfWords[idx] += 1

但它不是好，因爲我有一個非常大的單詞列表和非常大的長文件目錄，所以運行時間不現實。

來源

2017-08-02 טל סעדי

等待，所以你基本上只關心第一個'正1'字符長度N'的'一個字？ 'match'和'latch'不匹配，因爲off-by-one不在最後？ – poke

您可以使用Levenshtein距離檢查

def minimumEditDistance(s1,s2): 
if len(s1) > len(s2): 
    s1,s2 = s2,s1 
distances = range(len(s1) + 1) 
for index2,char2 in enumerate(s2): 
    newDistances = [index2+1] 
    for index1,char1 in enumerate(s1): 
     if char1 == char2: 
      newDistances.append(distances[index1]) 
     else: 
      newDistances.append(1 + min((distances[index1], 
             distances[index1+1], 
             newDistances[-1]))) 
    distances = newDistances 
return distances[-1] 


print(minimumEditDistance("kitten","sitting")) 
print(minimumEditDistance("rosettacode","raisethysword"))

https://rosettacode.org/wiki/Levenshtein_distance#Python

來源

2017-08-02 08:14:11 Ragnarok

你確定你的代碼的性能會比問題中的代碼好嗎？ –

這只是算法的一個例子，他可以按照自己喜歡的方式實現它。您還可以使用VP樹加快算法http://www.logarithmic.net/pfh/blog/01164790008 並獲得它在O（nlogn） – Ragnarok

在字符串中匹配案例詞python

回答

相關問題