計算列表中項目的頻率

我想每年在每個地區計算事故頻率。我該如何使用Python來做到這一點。計算列表中項目的頻率

FILE.CSV

Region,Year 
1,2003 
1,2003 
2,2008 
2,2007 
2,2007 
3,2004 
1,2004 
1,2004 
1,2004

我嘗試使用計數器，但它僅適用於一個列。例子：在區域1 2003年，有2 所以結果應該是：

Region,Year, freq 
    1,2003,2 
    1,2003,2 
    2,2008,1 
    2,2007,2 
    2,2007,2 
    3,2004,1 
    1,2004,3 
    1,2004,3 
    1,2004,3

我試圖做這種方式。但它似乎並不正確。

from collections import Counter 

data = pandas.DataFrame("file.csv") 
freq_year= Counter(data.year.values) 
dz = [dom[x] for x in data.year.values] 
data["freq"] = data["year"].apply(lambda x: dom[x])

我在考慮使用Groupby。你知道如何做到這一點嗎？

來源

2014-04-11 user3378649

有可能是一個更好的辦法，但我首先附加虛擬列和計算freq基於列，如：

df["freq"] = 1 
df["freq"] = df.groupby(["Year", "Region"]).transform(lambda x: x.sum())

這將返回以下DF：

Region Year freq 
0  1 2003  2 
1  1 2003  2 
2  2 2008  1 
3  2 2007  2 
4  2 2007  2 
5  3 2004  1 
6  1 2004  3 
7  1 2004  3 
8  1 2004  3

來源

2014-04-11 23:33:47 Blaszard

完美！謝謝你 – user3378649

我想繪製這個數據集。但似乎我面臨一個概率。你可以看看PLZ這個問題：http://stackoverflow.com/questions/23024439/how-to-customize-axes-in-3d-hist-python-matplotlib – user3378649

我不太瞭解matplotlib，沒有經驗在3D情節。希望你能在那裏得到幫助...... – Blaszard

不是pandas解決方案，但能夠完成任務：

import csv 
from collections import Counter 

inputs = [] 
with open('input.csv') as csvfile: 
    reader = csv.reader(csvfile) 
    for row in reader: 
     inputs.append(tuple(row)) 

freqs = Counter(inputs[1:]) 
print freqs 
# Counter({('1', '2004'): 3, ('1', '2003'): 2, ('2', '2007'): 2, ('2', '2008'): 1, ('3', '2004'): 1})

這裏的關鍵是有值的元組，這樣Counter會發現它們相等。

來源

2014-04-11 23:24:06 Hamatti

計算列表中項目的頻率

回答

相關問題