2017-09-08 176 views
1

我試圖根據兩個條件在熊貓中創建條件運行總和。有條件的運行計數熊貓

import pandas as pd 
ID = [1,1,1,2,2,3,4] 
after = ['A','B','B','A','A','B','A'] 
before = ['A','B','B','A','A','B','A'] 
df = pd.DataFrame([ID, before,after]).T 
df.columns = ['ID','before','after'] 

的數據是這樣的:

ID before after 
0 1  A  A 
1 1  B  B 
2 1  B  B 
3 2  A  A 
4 2  A  A 
5 3  B  B 
6 4  A  A 

我則想看看有多長的ID已爲B的前值,我的嘗試:

df['time_on_b'] = (df.groupby('before')['ID'].cumcount()+1).where(df['before']=='B',0) 

這使me:

ID before after time_on_b 
0 1  A  A   0 
1 1  B  B   1 
2 1  B  B   2 
3 2  A  A   0 
4 2  A  A   0 
5 3  B  B   3 
6 4  A  A   0 

i處理輸出如下:

ID before after time_on_b 
0 1  A  A   0 
1 1  B  B   1 
2 1  B  B   2 
3 2  A  A   0 
4 2  A  A   0 
5 3  B  B   1 
6 4  A  A   0 

正如你可以看到,作爲標識的變化我想time_on_b重置所以它給了我1的值,而不是3

回答

4

看來你通過ID需要組,然後用cumsum來算的B的出現:

cond = df.before == 'B' 
df['time_on_b'] = cond.groupby(df.ID).cumsum().where(cond, 0).astype(int) 
df 
# ID before after time_on_b 
#0 1  A  A 0 
#1 1  B  B 1 
#2 1  B  B 2 
#3 2  A  A 0 
#4 2  A  A 0 
#5 3  B  B 1 
#6 4  A  A 0 
2

你也可以使用transform

df.groupby('ID').before.transform(lambda x: x.eq('B').cumsum()) 

0 0 
1 1 
2 2 
3 0 
4 0 
5 1 
6 0 
Name: before, dtype: int32 

df.assign(time_on_b=df.groupby('ID').before.transform(lambda x: x.eq('B').cumsum())) 

    ID before after time_on_b 
0 1  A  A   0 
1 1  B  B   1 
2 1  B  B   2 
3 2  A  A   0 
4 2  A  A   0 
5 3  B  B   1 
6 4  A  A   0