剪枝相似字符串由長度

一個如何將消除可以通過相似性和長度包含從Python列表串的元素（如果字符串X在另一個發現，較長的字符串ÿ，X必須被移除）？剪枝相似字符串由長度

IN:  [('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)] 
OUT: [('this is string that stays', 0), ('i am safe', 3)]

來源

2011-08-31 SomeOne

給出你的例子，''這是字符串''在** **中是**，這是一個字符串''。是因爲第一個字符串中的每個單詞出現在第二個字符串中？如果這不是拼寫錯誤，請在**另一個字符串中指定一個字符串的含義是**。 –

更正了示例。 – SomeOne

好。你有什麼想法嗎？給我們一些討論。 –

在這裏你去：

l = [('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)] 
survivors = set(s for s, _ in l) 
for s1, _ in l: 
if any(s1 != s2 and s1 in s2 for s2 in survivors): 
    survivors.discard(s1)

survivors是你想要的，但它不包含輸入的元組號碼 - 改變這應該是爲讀者:-P練習。

來源

2011-08-31 07:15:45

謝謝。我想我自己解決它：https://pzt.me/13gn – SomeOne

試試這個：

IN = [('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)] 
OUT=[] 

def check_item(liste, item2check): 
    for item, _ in liste: 
     if item2check in item and len(item2check) < len(item): 
      return True 
    return False 

for item, rank in IN: 
    if not check_item(IN, item): 
     OUT.append((item, rank)) 

# or in a list-comprehension : 
OUT = [(item, rank) for item, rank in IN if not check_item(IN, item)] 
print OUT 

>>> [('this is string that stays', 0), ('i am safe', 3)]

來源

2011-08-31 07:20:05

，如果你不介意的順序（N * N）

>>> s=[('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)] 
>>> s=[i[0] for i in s] 
>>> result=[s[i] for i in range(len(s)) if not any(s[i] in s[j] for j in range(i)+range(i+1,len(s)-i))] 
>>> result 
['this is string that stays', 'i am safe']

如果你關心效率，我建議你每個字符串分割成的序列字（或甚至字符）並且製作樹數據結構，例如trie（http://community.topcoder.com/tc?module=Static &d1 = tutorials &d2 = usingTries），其允許在每個子序列上快速查找

來源

2011-08-31 07:22:40

所有其他答案都提供了很好的解決方案。我只想補充您嘗試記：

for i in range(0, len(d)): 
    for j in range(1, len(d)): 
    if d[j][0] in d[i][0] and len(d[i][0]) > len(d[j][0]): 
     del d[j]

失敗與列表索引超出範圍，因爲遍歷目錄，而你刪除。這裏有一種方法可以防止這個問題：

d = [('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)] 

to_be_removed = list() 
for i in range(0, len(d)): 
    for j in range(0, len(d)): 
    if i != j and d[j][0] in d[i][0] and len(d[i][0]) > len(d[j][0]): 
     to_be_removed.append(j) 
for m, n in enumerate(to_be_removed): 
    del d[n - m] 

print d

來源

2011-08-31 07:36:11 MarcoS

剪枝相似字符串由長度

回答

相關問題