使用python在csv中計數重複的行

我想這對於一個體面的Python開發者來說很簡單 - 我仍然在學習！考慮到與重複的電子郵件爲csv我想遍歷並寫出重複的電子郵件的數量，例如：使用python在csv中計數重複的行

infile.csv

COLUMN 0 
[email protected] 
[email protected] 
[email protected] 
[email protected]

outfile.csv

COLUMN 0     COLUMN 1 
[email protected]   2 
[email protected]  1 
[email protected]  1

所以遠我可以刪除重複與

import csv 

f = csv.reader(open('infile.csv','rb')) 
writer = csv.writer(open('outfile.csv','wb')) 
emails = set() 


for row in f: 
    if row[0] not in emails: 
     writer.writerow(row) 
     emails.add(row[0])

但我無法將計數寫入新列。

來源

2012-08-28 rebelbass

使用defaultdict這是在python2.6的

from collections import defaultdict 

# count all the emails before we write anything out 
emails = defaultdict(int) 
for row in f: 
    emails[row[0]] += 1 

# now write the file 
for row in email.items(): 
    writer.writerow(row)

來源

2012-08-28 02:07:47

不錯 - 我覺得我學到了一些東西！ – rebelbass

是的 - 這對於OLDER Python來說是更好的答案。 +1 – dawg

試試counter。它是專爲這種用途：

from collections import Counter 

emails=Counter() 
for row in f: 
    emails+=Counter([row[0]])

打印：

Counter({'[email protected]': 2, '[email protected]': 1, '[email protected]': 1, 'COLUMN 0': 1})

這是很容易從櫃檯得到任何其他數據結構：

print set(emails.elements()) 
# set(['[email protected]', 'COLUMN 0', '[email protected]', '[email protected]'])

請注意，我沒有跳過頭或寫出csv - 這很容易做到。

來源

2012-08-28 01:24:43 dawg

請務必注意，計數器只適用於Python 2.7+ – jdi

不幸的是我使用2.6！ – rebelbass

非常容易[返回端口]（http://code.activestate.com/recipes/576611/）到2.6 – dawg

對於Python 2.6，你可以嘗試像一個鴿巢排序： http://en.m.wikipedia.org/wiki/Pigeonhole_sort

它實際上有點這個確切的這類問題作出。

對於實際的設置，使用字典來保存數據，然後迭代它，而不是試圖將信息寫出去。

來源

2012-08-28 02:08:06

使用python在csv中計數重複的行

回答

相關問題