2013-10-01 48 views
0

我想重新編譯下面的列表,但也要保留重複列表以顯示在下面的屏幕上。這是從一個CSV文件中去掉,所以它會是巨大的,顯示的內容已添加和未添加什麼用戶「受騙者」等Django - 重複列表並保留重複文件

[ 
    ['first_name', 'last_name', 'email'], 
    ['Danny', 'Lastnme', '[email protected]'], 
    ['Sally', 'Surname', '[email protected]'], 
    ['Sally', 'Surname', '[email protected]'], < -- Dupe 
    ['Sally', 'Surname', '[email protected]'], < -- Dupe 
    ['Chris', 'Lastnam', '[email protected]'], 
    ['Larry', 'Seconds', '[email protected]'], 
    ['Barry', 'Barrins', '[email protected]'], 
    ['Glenn', 'Melting', '[email protected]'], 
    ['Glenn', 'Melting', '[email protected]'], < -- Dupe 
] 

最終的結果將產生兩個列表,其中一個很不錯的結果,另一個是重複列表。

獨特之處:

[ 
    ['first_name', 'last_name', 'email'], 
    ['Danny', 'Lastnme', '[email protected]'], 
    ['Sally', 'Surname', '[email protected]'], 
    ['Chris', 'Lastnam', '[email protected]'], 
    ['Larry', 'Seconds', '[email protected]'], 
    ['Barry', 'Barrins', '[email protected]'], 
    ['Glenn', 'Melting', '[email protected]'], 
] 

愚弄:

[ 
    ['Sally', 'Surname', '[email protected]'], 
    ['Sally', 'Surname', '[email protected]'], 
    ['Glenn', 'Melting', '[email protected]'], 
] 
+0

您是否需要保留這些信息有多少,或足以說出「薩利姓氏被多次發現」? –

+0

@TimPietzcker剛剛更新了問題。什麼是完美的方法來取回一個獨特的列表和欺騙列表,ala:'{'unique':[],'dupes':[]}乾杯。 –

回答

1

您可以複製並粘貼此代碼來獲得受騙者的回報詞典獨腳戲:

a = [ 
['first_name', 'last_name', 'email'], 
['Danny', 'Lastnme', '[email protected]'], 
['Sally', 'Surname', '[email protected]'], 
['Sally', 'Surname', '[email protected]'], 
['Sally', 'Surname', '[email protected]'], 
['Chris', 'Lastnam', '[email protected]'], 
['Larry', 'Seconds', '[email protected]'], 
['Barry', 'Barrins', '[email protected]'], 
['Glenn', 'Melting', '[email protected]'], 
['Glenn', 'Melting', '[email protected]'], 
] 

result = {} 

b = [tuple(x) for x in a[1:]] 
all_uniques = set(b) 
result['unique'] = [list(x) for x in list(all_uniques)] 

# To show which ones have duplicates use Mr Es solution: 

from collections import Counter 

t = Counter(b) 
dupes = [] 

for k, v in t.iteritems(): 
    if v > 1: 
     dupes.append(list(k)*(v-1)) 

result['dupes'] = dupes 

print(result) 
+0

完美,謝謝伊萬。 –

+0

不客氣@DannyBos - 總是樂於幫忙:-) – Ewan

0

你可以得到頻率

from collections import Counter 

t = Counter(tuple(x) for x in data[1:]) 

uniques = [list(k) for k, v in t.iteritems() if v == 1] 
dupes = [list(k) * (v-1) for k, v in t.iteritems() if v > 1] 
+0

這些都很棒,謝謝mr-e和@ tim-pietzcker。只有一個關於重複的問題,我如何(請原諒,如果它在純網站)使它列出所有重複,即使重複本身是重複的。目前上面的數據集只顯示兩個Dupes,其中有三個。明白我的意思了嗎? –

+0

查看已更新回答 – YXD

+0

謝謝E先生,這也是訣竅。真的很感激它。 –

1

試試這個。這是最簡單的方法。

name_list = [ 
    ['first_name', 'last_name', 'email'], 
    ['Danny', 'Lastnme', '[email protected]'], 
    ['Sally', 'Surname', '[email protected]'], 
    ['Sally', 'Surname', '[email protected]'], 
    ['Sally', 'Surname', '[email protected]'], 
    ['Chris', 'Lastnam', '[email protected]'], 
    ['Larry', 'Seconds', '[email protected]'], 
    ['Barry', 'Barrins', '[email protected]'], 
    ['Glenn', 'Melting', '[email protected]'], 
    ['Glenn', 'Melting', '[email protected]'], 
] 
sorted_name_list = sorted(name_list[1:]) 
last_record = False 
Unique = [] 
Dupes = [] 
for record in sorted_name_list: 
    if last_record != record: 
     Unique.append(record) 
    else: 
     Dupes.append(record) 
     last_record = record 
print Unique 
print Dupes