如何在列表中找到重複的單個字符串？

我試過這個解決方案：Find a repeating pattern in a list of strings。但它不適用於我的列表格式。我沒有得到任何錯誤，但它顯示錯誤的答案。我用同樣的代碼，我使用的字典所有列表存儲與關鍵：如何在列表中找到重複的單個字符串？

common_suffix = os.path.commonprefix([listDict[::-1] for items in listDict])[::-1] 
stripped_titles = [items[:-len(common_suffix)] for items in listDict] 
print len(stripped_titles)

我得到的答案是0

這裏有名單。我已經從一個CSV文件中的數據創建這些列表：

list1 = ['a1', 'b2', 'c4', 'y7', 'u5'] 
list2 = ['b4', 't5', 'g1'] 
list3 = ['b2', 'c4', 'f6', 'a1'] 
list4 = ['b2', 'a1'] 
list5 = ['r4', 'c4', 'a1', 'b2']

在這裏，我想找到重複串。可以說，我想採取三個元素a1, b2, c4（來自list1）我想找出這三個字符串在其他列表中存在多少次。順序無關緊要。字符串項目應該在其他列表中。在這種情況下，a1, b2, c4存在於list1，list3和list5中。所以我希望答案是「list1的前三個元素重複其他3個列表」。

如何完成以下功能：我希望能夠選擇一個數字，可以說k = 3，那麼所有列表中的第一個三元素將被測試重複模式，輸出將是，

a1, b2, c4 = 3

等等。

來源

2014-07-27 Gworld

你得到的錯誤是什麼？ – APerson

顯示您的代碼。並使用'print'來查看變量中的值，以檢查腳本是否在您的代碼的任何部分執行您期望的操作。 – furas

如何獲得'b4，t5，g1 = 5'？我沒有看到他們在任何其他名單。 – Gabriel

如果我理解你是什麼後（統計每個地方的所有項目出現，包括在第一個列表），你可以這樣做：

lists=[ ['a1', 'b2', 'c4', 'y7', 'u5'], 
     ['b4', 't5', 'g1'], 
     ['b2', 'c4', 'f6', 'a1'], 
     ['b2', 'a1'], 
     ['r4', 'c4', 'a1', 'b2']] 

k=3 
tgt=tuple(lists[0][0:k]) 
di={tgt:1} 
for li in lists[1:]: 
    if all(e in li for e in tgt): 
     di[tgt]+=1 

print(di)

打印：

{('a1', 'b2', 'c4'): 3}

如果你想簡化爲一條直線：

di={tgt:sum(all(e in li for e in tgt) for li in lists)}

然後，如果你希望你的準確輸出：

>>> print '{}={}'.format(', '.join(tgt), di[tgt]) 
a1, b2, c4=3

來源

2014-07-27 00:29:46 dawg

它應該是所有三個必須一起提供。 – Gworld

@Gworld：如果我明白你想要什麼，修正。 – dawg

這是非常好的。完成了，我真正想要的......非常感謝你 – Gworld

我明白這個問題的方式，你問以下內容：給定一組字符串，返回這些字符串出現在列表的數量

所以對於list4和k=2我們有'b2'和'a1' 。都出現在list1，所以我們得到一個列表。它們都不出現在list2中，因此不會加1.它們都出現在list3和list5之間，它們總共產生3個列表（不包括原始的list4）。但是，如果我的解決方案只包含'b2'或'a1'之一，也會添加一個列表。

首先是一個列表清單的解決方案，然後是一個列表字典。

list1 = ['a1', 'b2', 'c4', 'y7', 'u5'] 
list2 = ['b4', 't5', 'g1'] 
list3 = ['b2', 'c4', 'f6', 'a1'] 
list4 = ['b2', 'a1'] 
list5 = ['r4', 'c4', 'a1', 'b2']

這裏我把你的列表打包成一個列表。

lall= [list1,list2,list3,list4,list5] 

def cmatch1(lall, l0, k): 
    """ lall: a list of lists 
     l0: the list to check 
     k: number of entries from beginning of l0 to look for 
    """ 
    m = 0 
    for li in lall: 
     # dont check against supplied list 
     if li is l0: 
      continue 
     # loop over first k entries in supplied list 
     for cn in l0[:k]: 
      # if any of them appear in another list increment and break out 
      if cn in li: 
       m += 1 
       break 
    return m

相反，如果你有一個列表的字典，

dl = {'list1':list1, 'list2':list2, 'list3':list3, 'list4':list4, 'list5':list5} 

def cmatch2(dl, key, k): 
    """ dl: a dict of lists 
     key: key of list to check 
     k: number of entries from beginning of dl[key] to look for 
    """ 
    m = 0 
    for ki,li in dl.items(): 
     # dont check against supplied list 
     if ki == key: 
      continue 
     # loop over first k entries in supplied list 
     for cn in dl[key][:k]: 
      # if any of them appear in another list increment and break out 
      if cn in li: 
       m += 1 
       break 
    return m

如果你想然後檢查每個列表中的前三項對所有其他列表中，您可以撥打上面的功能說明一旦每個列表使用第二個參數。不知道你想要發生什麼，當k=3和list4沒有k條目，但你可以檢查列表上的外部循環。

所以，如果你這樣做，

>>> cmatch2(dl, 'list1', 3) 
3 

>>> cmatch2(dl, 'list2', 3) 
0 

>>> cmatch3(dl, 'list3', 2) 
3

來源

2014-07-27 00:17:44 Gabriel

我會用計數器：

from collections import Counter 

def find_common(k,l): 
    c = Counter() 
    for ele in l: 
     c.update(ele[:k]) 
    most_com = c.most_common() 
    final = [x for x in c if c[x] == most_com[0][1]]+[most_com[0][1]] 
    return final 
print find_common(3,lists) 
['a1', 'b2', 'c4', 3]

來源

2014-07-27 01:44:03

如果訂單信息並不重要，看看PyFIM，它實現了許多算法爲頻繁項目集挖掘。這些方法傳統上會對數據項進行排序。

還有一個稱爲「頻繁子序列挖掘」這個問題的版本，其中考慮到的順序。

來源

2014-07-27 09:12:28

如何在列表中找到重複的單個字符串？

回答

相關問題