2017-04-20 67 views
0

這是我第一次在這裏問一些問題,我對此很陌生,所以我會盡我所能。我有一個短語的列表,我要消除所有的詞組一樣,如:刪除字符串列表中的元素

array = ["A very long string saying some things", 
     "Another long string saying some things", 
     "extremely large string saying some things", 
     "something different", 
     "this is a test"] 

我想這樣的結果:

array2 = ["A very long string saying some things", 
      "something different", 
      "this is a test"]` 

我有這樣的:

for i in range(len(array)): 
    swich=True 
    for j in range(len(array2)): 
     if (fuzz.ratio(array[i],array2[j]) >= 80) and (swich == True): 
      swich=False 
      pass 
     if (fuzz.ratio(array[i],array2[j]) >= 80) and (swich == False): 
      array2.pop(j) 

但給我的名單IndexError ...

fuzzy.ratio比較兩個字符串,並給出一個值爲補間0和100,越大,弦越相似。

我想要做的是按元素比較列表元素,第一次找到兩個相似的字符串,只需打開開關,並從那一點通過,每次類似的發現,彈出元素array2。我完全接受任何建議。

+2

給出確切的錯誤跟蹤...哪個列表有索引錯誤? – rassar

回答

0

您得到的錯誤是由列表的修改引起的,您在該列表中修改了該列表。 (不要添加/刪除/替換當前迭代的迭代元素!)range(len(array2))知道長度爲N,但是在array2.pop(j)之後,長度不再是N,而是N-1。之後嘗試訪問第N個元素時,您將獲得IndexError,因爲列表現在更短。

上的另一種方法快速猜測:

original = ["A very long string saying some things", "Another long string saying some things", "extremely large string saying some things", "something different", "this is a test"] 

filtered = list() 

for original_string in original: 
    include = True 
    for filtered_string in filtered: 
     if fuzz.ratio(original_string, filtered_string) >= 80: 
      include = False 
      break 
    if include: 
     filtered.append(original_string) 

請注意for string in array循環,這是更「Python的」,不需要整型變量,也不範圍。

0

如何使用不同的庫來壓縮代碼並減少循環次數?

import difflib 

def remove_similar_words(word_list): 
    for elem in word_list: 
     first_pass = difflib.get_close_matches(elem, word_list) 
     if len(first_pass) > 1: 
      word_list.remove(first_pass[-1]) 
      remove_similar_words(word_list) 
    return word_list 


l = ["A very long string saying some things", "Another long string saying some things", "extremely large string saying some things", "something different", "this is a test"] 

remove_similar_words(l) 

['A very long string saying some things', 
'something different', 
'this is a test']