從大列表中打印每組重複字符串及其索引的編號

嘗試在列表中找到重複的字符串〜100,000，並計算它們各自的位置以及它們所在的索引並打印它們。到目前爲止，我想出了這一點：從大列表中打印每組重複字符串及其索引的編號

list_b = ['04/Sep/2016:00:00:03 -0400', '04/Sep/2016:00:00:04 -0400', '04/Sep/2016:00:00:05 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:11 -0400', '04/Sep/2016:00:00:15 -0400', '04/Sep/2016:00:00:19 -0400', '04/Sep/2016:00:00:20 -0400', '04/Sep/2016:00:00:23 -0400', '04/Sep/2016:00:00:25 -0400', '04/Sep/2016:00:00:26 -0400'] 

for i in list_b: 
    if(i in list_b): 
     print(i + " Amount of duplicates: " + amount of duplicates + " Index of duplicates: " + index of duplicate)

輸出應該是這樣的：

"04/Sep/2016:00:00:06 -0400 Amount of duplicates: 2 Index of duplicates: 3,4" 
"04/Sep/2016:00:00:08 -0400 Amount of duplicates: 3 Index of duplicates: 5,6,7"

來源

2016-11-14 Daniel Plas Rivera

如果你創建一個字典，鍵爲'timestamp'和值作爲你遇到過多少次了'count'？但是，然後創建字典將是一個「O（n）」操作，並且打印也將是一個「O（n）」操作，反正它大約是'O（n）'。如果那是你的事情，我會寫一些代碼。 –

你確定使用第三方庫嗎？ – trendsetter37

from collections import defaultdict 

list_b = ['04/Sep/2016:00:00:03 -0400', '04/Sep/2016:00:00:04 -0400', '04/Sep/2016:00:00:05 -0400', 
      '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:08 -0400', 
      '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:11 -0400', 
      '04/Sep/2016:00:00:15 -0400', '04/Sep/2016:00:00:19 -0400', '04/Sep/2016:00:00:20 -0400', 
      '04/Sep/2016:00:00:23 -0400', '04/Sep/2016:00:00:25 -0400', '04/Sep/2016:00:00:26 -0400'] 

indices_dict = defaultdict(list) 

for index, value in enumerate(list_b): 
    indices_dict[value].append(index) 

for value, index_list in indices_dict.items(): 
    num_duplicates = len(index_list) 
    if num_duplicates > 1: 
     print("%s Amount of duplicates: %s, Indices of duplicates: %s" % 
       (value, num_duplicates, index_list))

來源

2016-11-14 21:51:55

我知道這些不是嵌套for循環，但是如果n == len（timestamps）'反正這不會是O（n^2）'嗎？ – trendsetter37

@ trendsetter37不，你爲什麼這麼想？第一個循環遍歷「n」個項目，第二個循環遍歷更少，並且沒有其他點處理許多項目。在字典中查找和插入大致爲O（1）。我測試了代碼，它幾乎立即運行。 –

哎呀，是的，你是對的。 indices_dict是list_b的摺疊版本。 – trendsetter37

list_b = ['04/Sep/2016:00:00:03 -0400', '04/Sep/2016:00:00:04 -0400', '04/Sep/2016:00:00:05 -0400', 
      '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:08 -0400', 
      '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:11 -0400', 
      '04/Sep/2016:00:00:15 -0400', '04/Sep/2016:00:00:19 -0400', '04/Sep/2016:00:00:20 -0400', 
      '04/Sep/2016:00:00:23 -0400', '04/Sep/2016:00:00:25 -0400', '04/Sep/2016:00:00:26 -0400'] 

duplicates = set() 

for i in list_b: 
    indices = [pos for pos, s in enumerate(list_b) if s == i] 
    if len(indices) > 1: 
     duplicates.add("%s Ammount of duplicates: %d Index of duplicates: %s" % (i, len(indices), indices)) 

for dup in duplicates: 
    print(dup)

來源

2016-11-14 21:50:58

這裏有很多冗餘的工作。對於list_b中的我：count = list_b.count（i）'是O（n^2），那麼list理解增加更多。 –

@AlexHall不夠公平。更新。 –

'我在list_b：indices = [...]中仍然是O（n^2）。 –

這應該這樣做

mylist = ["a", "a", "b", "c", "b"] 

for index, item in enumerate(mylist): 
    rep_time = mylist.count(item) 
    print(item, " Amount of duplicates: ", rep_time, "| Index of duplicates: ", index)

關於Python 3和測試它工作正常

來源

2016-11-14 22:08:57 Ayoub

它打印出真實的信息，但不是OP想要的方式。例如，它打印出「重複數量：2」兩次，每次出現一次「a」，並按順序打印所有索引，每行一個索引。 –

好嗎你想逃避計數重複的項目？ – Ayoub

我不確定你的意思。 OP給出了一些樣本輸出，例如'04/Sep/2016：00：00：06 -0400重複數量：2重複索引：3,4'。請注意它如何將兩個指數（'3,4'）放在一行中。您的腳本不會以該格式輸出。 –

list_b = ['04/Sep/2016:00:00:03 -0400', '04/Sep/2016:00:00:04 -0400', '04/Sep/2016:00:00:05 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:11 -0400', '04/Sep/2016:00:00:15 -0400', '04/Sep/2016:00:00:19 -0400', '04/Sep/2016:00:00:20 -0400', '04/Sep/2016:00:00:23 -0400', '04/Sep/2016:00:00:25 -0400', '04/Sep/2016:00:00:26 -0400'] 
results={} 
for i in range(len(list_b)): 
    if list_b[i] not in results: 
     results[list_b[i]]={'string':list_b[i],'count':list_b.count(list_b[i]),'index':[i]} 
    else: 
     results[list_b[i]]['index'].append(i) 
for result in results: 
    if len(results[result]['index'])>1: 
     print results[result]['string'],'Amount of duplicates:',results[result]['count'],'Index of Duplicates:',",".join(map(str,results[result]['index']))

輸出

04/Sep/2016:00:00:06 -0400 Amount of duplicates: 2 Index of Duplicates: 3,4 
04/Sep/2016:00:00:08 -0400 Amount of duplicates: 3 Index of Duplicates: 5,6,7

來源

2016-11-14 22:13:46 repzero

當所有字符串都是唯一的時候，這是O（n^2），因爲它將執行n次「list_b.count」（一個O（n）操作）。爲了演示，使'list_b = list（map（str，range（100000）））'然後在循環開始處添加if if％== 0：print（i）：它打印速度很慢。 –

從大列表中打印每組重複字符串及其索引的編號

回答

相關問題