2012-01-10 73 views
4

我想解析CSV文件並彙總值。市行已重複值(樣本):解析CSV文件並彙總值

CITY,AMOUNT 
London,20 
Tokyo,45 
London,55 
New York,25 

解析結果後應該是這樣的:

CITY, AMOUNT 
London,75 
Tokyo,45 
New York,25 

我已經寫了下面的代碼提取獨特的城市名稱:

def main(): 
    contrib_data = list(csv.DictReader(open('contributions.csv','rU'))) 
    combined = [] 
    for row in contrib_data: 
     if row['OFFICE'] not in combined: 
     combined.append(row['OFFICE']) 

我該如何彙總數值?

+1

提示:使用字典而不是列表。城市作爲關鍵,總和(金額)作爲價值 – 2012-01-10 08:03:25

回答

6

測試在Python 3.2.2:

import csv 
from collections import defaultdict 
reader = csv.DictReader(open('test.csv', newline='')) 
cities = defaultdict(int) 
for row in reader: 
    cities[row["CITY"]] += int(row["AMOUNT"]) 

writer = csv.writer(open('out.csv', 'w', newline = '')) 
writer.writerow(["CITY", "AMOUNT"]) 
writer.writerows([city, cities[city]] for city in cities) 

結果:

CITY,AMOUNT 
New York,25 
London,75 
Tokyo,45 

至於你額外的要求:

import csv 
from collections import defaultdict 

def default_factory(): 
    return [0, None, None, 0] 

reader = csv.DictReader(open('test.csv', newline='')) 
cities = defaultdict(default_factory) 
for row in reader: 
    amount = int(row["AMOUNT"]) 
    cities[row["CITY"]][0] += amount 
    max = cities[row["CITY"]][1] 
    cities[row["CITY"]][1] = amount if max is None else amount if amount > max else max 
    min = cities[row["CITY"]][2] 
    cities[row["CITY"]][2] = amount if min is None else amount if amount < min else min 
    cities[row["CITY"]][3] += 1 
for city in cities: 
    cities[city][3] = cities[city][0]/cities[city][3] # calculate mean 

writer = csv.writer(open('out.csv', 'w', newline = '')) 
writer.writerow(["CITY", "AMOUNT", "max", "min", "mean"]) 
writer.writerows([city] + cities[city] for city in cities) 

這給你

CITY,AMOUNT,max,min,mean 
New York,25,25,25,25.0 
London,75,55,20,37.5 
Tokyo,45,45,45,45.0 

請注意,在Python 2下,您需要在頂部增加一行from __future__ import division才能獲得正確的結果。

+0

在Python 2.7上測試了它,它工作正常。我想知道爲什麼一個普通的Dict不工作,爲什麼我必須使用defaultdict()? – jwesonga 2012-01-10 08:35:29

+1

'defaultdict(int)'允許您使用尚未定義的鍵並自動爲其賦值'0'。所以你可以做'城市[「博爾頓」] + = 10',如果鍵已經存在,它可以創建一個新的鍵''博爾頓'',值爲'10'或者加上'10'。如果你用普通的'dict'來做這件事,你會得到很多'KeyError'。 – 2012-01-10 08:39:20

+0

感謝您的反饋。是否有可能使MAX,MIN和MEAN值在同一個循環中? – jwesonga 2012-01-10 09:03:29

0

將值與AMOUNT一起使用的dict可能會訣竅。像以下各項

一些假設你一次讀取一行,並city指示當前城市和amount表示電流量 -

main_dict = {} 

---for loop here--- 
if city in main_dict: 
    main_dict[city] = main_dict[city] + amount 
else: 
    main_dict[city] = amount 
---end for loop--- 

在循環結束時,您將有main_dict合計值。

+0

當我這樣嘗試時,我總是收到一個關鍵錯誤。 – jwesonga 2012-01-10 08:34:00

+0

不應該。你能在這裏顯示你的代碼的關鍵部分,你得到的KeyError。 – Siddharth 2012-01-12 07:09:47