一個如何將消除可以通過相似性和長度包含從Python列表串的元素(如果字符串X在另一個發現,較長的字符串ÿ,X必須被移除)?剪枝相似字符串由長度
IN: [('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)]
OUT: [('this is string that stays', 0), ('i am safe', 3)]
一個如何將消除可以通過相似性和長度包含從Python列表串的元素(如果字符串X在另一個發現,較長的字符串ÿ,X必須被移除)?剪枝相似字符串由長度
IN: [('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)]
OUT: [('this is string that stays', 0), ('i am safe', 3)]
在這裏你去:
l = [('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)]
survivors = set(s for s, _ in l)
for s1, _ in l:
if any(s1 != s2 and s1 in s2 for s2 in survivors):
survivors.discard(s1)
survivors
是你想要的,但它不包含輸入的元組號碼 - 改變這應該是爲讀者:-P練習。
謝謝。我想我自己解決它:https://pzt.me/13gn – SomeOne
試試這個:
IN = [('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)]
OUT=[]
def check_item(liste, item2check):
for item, _ in liste:
if item2check in item and len(item2check) < len(item):
return True
return False
for item, rank in IN:
if not check_item(IN, item):
OUT.append((item, rank))
# or in a list-comprehension :
OUT = [(item, rank) for item, rank in IN if not check_item(IN, item)]
print OUT
>>> [('this is string that stays', 0), ('i am safe', 3)]
,如果你不介意的順序(N * N)
>>> s=[('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)]
>>> s=[i[0] for i in s]
>>> result=[s[i] for i in range(len(s)) if not any(s[i] in s[j] for j in range(i)+range(i+1,len(s)-i))]
>>> result
['this is string that stays', 'i am safe']
如果你關心效率,我建議你每個字符串分割成的序列字(或甚至字符)並且製作樹數據結構,例如trie(http://community.topcoder.com/tc?module=Static &d1 = tutorials &d2 = usingTries),其允許在每個子序列上快速查找
所有其他答案都提供了很好的解決方案。我只想補充您嘗試記:
for i in range(0, len(d)):
for j in range(1, len(d)):
if d[j][0] in d[i][0] and len(d[i][0]) > len(d[j][0]):
del d[j]
失敗與列表索引超出範圍,因爲遍歷目錄,而你刪除。這裏有一種方法可以防止這個問題:
d = [('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)]
to_be_removed = list()
for i in range(0, len(d)):
for j in range(0, len(d)):
if i != j and d[j][0] in d[i][0] and len(d[i][0]) > len(d[j][0]):
to_be_removed.append(j)
for m, n in enumerate(to_be_removed):
del d[n - m]
print d
給出你的例子,''這是字符串''在** **中是**,這是一個字符串''。是因爲第一個字符串中的每個單詞出現在第二個字符串中?如果這不是拼寫錯誤,請在**另一個字符串中指定一個字符串的含義是**。 –
更正了示例。 – SomeOne
好。你有什麼想法嗎?給我們一些討論。 –