2016-11-30 207 views
2

我有這樣一個數據幀:熊貓累積計數

0  04:10 obj1 
1  04:10 obj1 
2  04:11 obj1 
3  04:12 obj2 
4  04:12 obj2 
5  04:12 obj1 
6  04:13 obj2 

想獲得的累積計數像這樣所有對象:

idx  time object obj1_count obj2_count 
0  04:10 obj1  1    0 
1  04:10 obj1  2    0 
2  04:11 obj1  3    0 
3  04:12 obj2  3    1 
4  04:12 obj2  3    2 
5  04:12 obj1  4    2 
6  04:13 obj2  4    3 

試圖與cumsum玩,但不知道是正確的方式。有什麼建議麼?

回答

3

你可以只比較柱對感興趣的值,並調用cumsum

In [12]: 
df['obj1_count'] = (df['object'] == 'obj1').cumsum() 
df['obj2_count'] = (df['object'] == 'obj2').cumsum() 
df 

Out[12]: 
     time object obj1_count obj2_count 
idx          
0 04:10 obj1   1   0 
1 04:10 obj1   2   0 
2 04:11 obj1   3   0 
3 04:12 obj2   3   1 
4 04:12 obj2   3   2 
5 04:12 obj1   4   2 
6 04:13 obj2   4   3 

這裏的比較將產生一個布爾系列:當你在上面的調用cumsum

In [13]: 
df['object'] == 'obj1' 

Out[13]: 
idx 
0  True 
1  True 
2  True 
3 False 
4 False 
5  True 
6 False 
Name: object, dtype: bool 

True的數值被轉換爲1False0,並累計相加

2

您可以通過獲取cumsumpd.get_dummies來概括此過程。這應該爲你工作要計算對象的任意數,而無需分別指定每個:

# Get the cumulative counts. 
counts = pd.get_dummies(df['object']).cumsum() 

# Rename the count columns as appropriate. 
counts = counts.rename(columns=lambda col: col+'_count') 

# Join the counts to the original df. 
df = df.join(counts) 

輸出結果:

time object obj1_count obj2_count 
0 04:10 obj1   1   0 
1 04:10 obj1   2   0 
2 04:11 obj1   3   0 
3 04:12 obj2   3   1 
4 04:12 obj2   3   2 
5 04:12 obj1   4   2 
6 04:13 obj2   4   3 

可以省略rename一步,如果它是可以接受的使用count作爲前綴而不是後綴,即'count_obj1'而不是'obj1_count'。使用時只需將prefix參數pd.get_dummies

counts = pd.get_dummies(df['object'], prefix='count').cumsum() 
0

下面是使用numpy的

u, iv = np.unique(
    df.object.values, 
    return_inverse=True 
) 

objcount = pd.DataFrame(
    (iv[:, None] == np.arange(len(u))).cumsum(0), 
    df.index, u 
) 
pd.concat([df, objcount], axis=1) 

enter image description here

方式