的Python /大熊貓：分組和由日期和ID

計數記錄我有在Python（〜10^6條）的相對大的數據幀，構成爲這樣：的Python /大熊貓：分組和由日期和ID

Index,Date,City,State,ID,County,Age,A,B,C 
0,9/1/16,X,AL,360,BB County,29.0,negative,positive,positive 
1,9/1/16,X,AL,360,BB County,1.0,negative,negative,negative 
2,9/1/16,X,AL,360,BB County,10.0,negative,negative,negative 
3,9/1/16,X,AL,360,BB County,11.0,negative,negative,negative 
4,9/1/16,X,AR,718,LL County,67.0,negative,negative,negative 
5,9/1/16,X,AR,728,JJ County,3.0,negative,negative,negative 
6,9/1/16,X,AR,728,JJ County,8.0,negative,negative,negative 
7,9/1/16,X,AR,728,JJ County,8.0,negative,negative,negative 
8,9/1/16,X,AR,728,JJ County,14.0,negative,negative,negative 
9,9/1/16,X,AR,728,JJ County,5.0,negative,negative,negative 
...

我通過日期試圖組（天）和ID，然後計算1）每天和ID的記錄總數，以及2）每天和ID中「A」列（例如）的「正數」總數。最後，我想填充數據幀表示肯定和對每一天，ID，例如記錄總數的數量，

Date,ID,Positive,Total 
9/1/16,360,10,20 
9/2/16,360,12,23 
9/2/16,718,2,43 
...

我原來使用的雙for循環，通過每一個獨特的那一天，和身份證，但這需要太多時間。我希望能有更好的方法幫助。預先感謝您的任何意見！

來源

2017-04-06 jtam

看進入Pandas文檔中的'groupby'。 –

我試過這個，但是我不能讓它做我想做的。 – jtam

我把你所提供的數據，並創建了一個小.csv文件，這樣你就可以複製......此外，我改變了一些值來測試這個工程：

Index,Date,City,State,ID,County,Age,A,B,C 
0,9/1/16,X,AL,360,BB County,29.0,negative,positive,positive 
1,9/1/16,X,AL,360,BB County,1.0,positive,negative,negative 
2,9/1/16,X,AL,360,BB County,10.0,positive,negative,negative 
3,9/1/16,X,AL,360,BB County,11.0,negative,negative,negative 
4,9/1/16,X,AR,718,LL County,67.0,negative,negative,negative 
5,9/2/16,X,AR,728,JJ County,3.0,negative,negative,negative 
6,9/2/16,X,AR,728,JJ County,8.0,positive,negative,negative 
7,9/2/16,X,AR,728,JJ County,8.0,negative,negative,negative 
8,9/3/16,X,AR,728,JJ County,14.0,negative,negative,negative 
9,9/3/16,X,AR,728,JJ County,5.0,negative,negative,negative

一旦你讀它，這裏是如何事情看起來：

>>> X = pd.read_csv('data.csv', header=0, index_col=None).drop('Index', axis=1) 
>>> print(X) 

    Date City State ID  County Age   A   B   C 
0 9/1/16 X AL 360 BB County 29.0 negative positive positive 
1 9/1/16 X AL 360 BB County 1.0 positive negative negative 
2 9/1/16 X AL 360 BB County 10.0 positive negative negative 
3 9/1/16 X AL 360 BB County 11.0 negative negative negative 
4 9/1/16 X AR 718 LL County 67.0 negative negative negative 
5 9/2/16 X AR 728 JJ County 3.0 negative negative negative 
6 9/2/16 X AR 728 JJ County 8.0 positive negative negative 
7 9/2/16 X AR 728 JJ County 8.0 negative negative negative 
8 9/3/16 X AR 728 JJ County 14.0 negative negative negative 
9 9/3/16 X AR 728 JJ County 5.0 negative negative negative

這是一個適用於每個組中的groupby調用函數：

def _ct_id_pos(grp): 
    return grp[grp.A == 'positive'].shape[0], grp.shape[0]

這將是一個兩步驟的過程...使用熊貓，你可以分組幾列並應用上述功能。

# the following will have the tuple in one column 
>>> X_prime = X.groupby(['Date', 'ID']).apply(_ct_id_pos).reset_index() 
>>> print(X_prime) 
    Date ID  0 
0 9/1/16 360 (2, 4) 
1 9/1/16 718 (0, 1) 
2 9/2/16 728 (1, 3) 
3 9/3/16 728 (0, 2)

通知的GROUPBY函數的結果給了我們具有嵌入式元組的新列，所以下一步是拆分那些到各自的相應列，並掉落嵌入一個：

>>> X_prime[['Positive', 'Total']] = X_prime[0].apply(pd.Series) 
>>> X_prime.drop([0], axis=1, inplace=True) 
>>> print(X_prime) 
    Date ID Positive Total 
0 9/1/16 360   2  4 
1 9/1/16 718   0  1 
2 9/2/16 728   1  3 
3 9/3/16 728   0  2

來源

2017-04-06 20:21:47 Tgsmith61591

謝謝Tgsmith61591。我不確定我是否理解這一切，但我會試着弄清楚它是如何工作的。無論如何，它正在做我所需要的。再次感謝！ – jtam

你能告訴我「shape [0]」在做什麼嗎？我查看了groupby文檔，沒有看到它。我也玩過它，並注意到函數需要返回正確的數據。但是，唉，我搞不清楚它到底在做什麼。 – jtam

沒關係，我想通了！ Shape僅返回尺寸，在這種情況下，尺寸等於「正數」和「總數」的數量。 – jtam

的Python /大熊貓：分組和由日期和ID

回答

相關問題