2016-11-14 14 views
0

嘗試在列表中找到重複的字符串〜100,000,並計算它們各自的位置以及它們所在的索引並打印它們。到目前爲止,我想出了這一點:從大列表中打印每組重複字符串及其索引的編號

list_b = ['04/Sep/2016:00:00:03 -0400', '04/Sep/2016:00:00:04 -0400', '04/Sep/2016:00:00:05 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:11 -0400', '04/Sep/2016:00:00:15 -0400', '04/Sep/2016:00:00:19 -0400', '04/Sep/2016:00:00:20 -0400', '04/Sep/2016:00:00:23 -0400', '04/Sep/2016:00:00:25 -0400', '04/Sep/2016:00:00:26 -0400'] 

for i in list_b: 
    if(i in list_b): 
     print(i + " Amount of duplicates: " + amount of duplicates + " Index of duplicates: " + index of duplicate) 

輸出應該是這樣的:

"04/Sep/2016:00:00:06 -0400 Amount of duplicates: 2 Index of duplicates: 3,4" 
"04/Sep/2016:00:00:08 -0400 Amount of duplicates: 3 Index of duplicates: 5,6,7" 
+0

如果你創建一個字典,鍵爲'timestamp'和值作爲你遇到過多少次了'count'?但是,然後創建字典將是一個「O(n)」操作,並且打印也將是一個「O(n)」操作,反正它大約是'O(n)'。如果那是你的事情,我會寫一些代碼。 –

+0

你確定使用第三方庫嗎? – trendsetter37

回答

1
from collections import defaultdict 

list_b = ['04/Sep/2016:00:00:03 -0400', '04/Sep/2016:00:00:04 -0400', '04/Sep/2016:00:00:05 -0400', 
      '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:08 -0400', 
      '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:11 -0400', 
      '04/Sep/2016:00:00:15 -0400', '04/Sep/2016:00:00:19 -0400', '04/Sep/2016:00:00:20 -0400', 
      '04/Sep/2016:00:00:23 -0400', '04/Sep/2016:00:00:25 -0400', '04/Sep/2016:00:00:26 -0400'] 

indices_dict = defaultdict(list) 

for index, value in enumerate(list_b): 
    indices_dict[value].append(index) 

for value, index_list in indices_dict.items(): 
    num_duplicates = len(index_list) 
    if num_duplicates > 1: 
     print("%s Amount of duplicates: %s, Indices of duplicates: %s" % 
       (value, num_duplicates, index_list)) 
+0

我知道這些不是嵌套for循環,但是如果n == len(timestamps)'反正這不會是O(n^2)'嗎? – trendsetter37

+0

@ trendsetter37不,你爲什麼這麼想?第一個循環遍歷「n」個項目,第二個循環遍歷更少,並且沒有其他點處理許多項目。在字典中查找和插入大致爲O(1)。我測試了代碼,它幾乎立即運行。 –

+0

哎呀,是的,你是對的。 indices_dict是list_b的摺疊版本。 – trendsetter37

0
list_b = ['04/Sep/2016:00:00:03 -0400', '04/Sep/2016:00:00:04 -0400', '04/Sep/2016:00:00:05 -0400', 
      '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:08 -0400', 
      '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:11 -0400', 
      '04/Sep/2016:00:00:15 -0400', '04/Sep/2016:00:00:19 -0400', '04/Sep/2016:00:00:20 -0400', 
      '04/Sep/2016:00:00:23 -0400', '04/Sep/2016:00:00:25 -0400', '04/Sep/2016:00:00:26 -0400'] 

duplicates = set() 

for i in list_b: 
    indices = [pos for pos, s in enumerate(list_b) if s == i] 
    if len(indices) > 1: 
     duplicates.add("%s Ammount of duplicates: %d Index of duplicates: %s" % (i, len(indices), indices)) 

for dup in duplicates: 
    print(dup) 
+0

這裏有很多冗餘的工作。對於list_b中的我:count = list_b.count(i)'是O(n^2),那麼list理解增加更多。 –

+0

@AlexHall不夠公平。更新。 –

+0

'我在list_b:indices = [...]中仍然是O(n^2)。 –

0

這應該這樣做

mylist = ["a", "a", "b", "c", "b"] 

for index, item in enumerate(mylist): 
    rep_time = mylist.count(item) 
    print(item, " Amount of duplicates: ", rep_time, "| Index of duplicates: ", index) 

關於Python 3和測試它工作正常

+0

它打印出真實的信息,但不是OP想要的方式。例如,它打印出「重複數量:2」兩次,每次出現一次「a」,並按順序打印所有索引,每行一個索引。 –

+0

好嗎你想逃避計數重複的項目? – Ayoub

+0

我不確定你的意思。 OP給出了一些樣本輸出,例如'04/Sep/2016:00:00:06 -0400重複數量:2重複索引:3,4'。請注意它如何將兩個指數('3,4')放在一行中。您的腳本不會以該格式輸出。 –

0
list_b = ['04/Sep/2016:00:00:03 -0400', '04/Sep/2016:00:00:04 -0400', '04/Sep/2016:00:00:05 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:06 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:08 -0400', '04/Sep/2016:00:00:11 -0400', '04/Sep/2016:00:00:15 -0400', '04/Sep/2016:00:00:19 -0400', '04/Sep/2016:00:00:20 -0400', '04/Sep/2016:00:00:23 -0400', '04/Sep/2016:00:00:25 -0400', '04/Sep/2016:00:00:26 -0400'] 
results={} 
for i in range(len(list_b)): 
    if list_b[i] not in results: 
     results[list_b[i]]={'string':list_b[i],'count':list_b.count(list_b[i]),'index':[i]} 
    else: 
     results[list_b[i]]['index'].append(i) 
for result in results: 
    if len(results[result]['index'])>1: 
     print results[result]['string'],'Amount of duplicates:',results[result]['count'],'Index of Duplicates:',",".join(map(str,results[result]['index'])) 

輸出

04/Sep/2016:00:00:06 -0400 Amount of duplicates: 2 Index of Duplicates: 3,4 
04/Sep/2016:00:00:08 -0400 Amount of duplicates: 3 Index of Duplicates: 5,6,7 
+0

當所有字符串都是唯一的時候,這是O(n^2),因爲它將執行n次「list_b.count」(一個O(n)操作)。爲了演示,使'list_b = list(map(str,range(100000)))'然後在循環開始處添加if if%== 0:print(i):它打印速度很慢。 –