2016-10-21 250 views
3
test = pd.DataFrame({'injury':['A', 'B', 'B', 'A', 'A', 'C', 'A', 'B', 'A'], 'crash_drinking':[1, 1, 1, 0, 0, 0, 1, 0, 1], 'crash_drugs':[0,0,0,1,1,0,0,1,1], 'driver_drinking':[1,1,0,0,0,0,0,1,0], 'driver_drugged':[0,0,0,0,1,0,0,1,0]}) 

    crash_drinking crash_drugs driver_drinking driver_drugged injury 
0    1   0    1    0  A 
1    1   0    1    0  B 
2    1   0    0    0  B 
3    0   1    0    0  A 
4    0   1    0    1  A 
5    0   0    0    0  C 
6    1   0    0    0  A 
7    0   1    1    1  B 
8    1   1    0    0  A 

複雜的過濾器我想我的輸出看起來像這樣(列名稱更改爲從上面的數據幀進行區分):大熊貓:通過GROUPBY

drinking crash drinking driver in crash drugged crash drugged driver in crash 
A    2      1     2       1 
B    2      1     1       0 

其中第一行,"injury" = 'A',並且以下過濾器已到位:

「飲酒崩潰」是計數其中crash_drinking = 1crash_drugs = 0;

「飲酒司機在墜毀」是其中crash_drinking = 1,crash_drugs = 0,driver_drinking = 1,driver_drugs is 0;

「迷藥撞車」 就是crash_drinking = 0crash_drugs = 1;

「撞車迷藥司機」 就是crash_drinking = 0crash_drugs = 1driver_drinking = 0,driver_drugs = 1

同爲B行,但也正是"injury" = 'B'.

現在我只是有一堆的.loc過濾器設置:

test.loc[(test['injury'] == 'A') & (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0)] 
test.loc[(test['injury'] == 'A') & (test['crash_drinking'] == 0) & (test['crash_drugs'] == 1)] 
test.loc[(test['injury'] == 'A') & (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0) & (test['driver_drinking'] == 1) & (test['driver_drugged'] == 0)] 

等等

我寧願這樣做通過groupby或.apply(),因爲我認爲這會比遍歷所有這些查詢更快。但我不確定正確的語法。也許我應該在「傷病」專欄上做一個.groupby(),然後從那裏開始......?

+0

您的數據幀不匹配其表示(列名不同)的定義。 –

+0

您的意思是我所需輸出中的列與輸入不同?新列與原始列不同,它們是列的組合,所以我想區分它們。我可以改變他們,但我認爲如果他們是相同的會更混亂。 – ale19

+0

不,請查看第一行代碼以及之後打印的數據框。他們有不同的列名,這是令人困惑的。 –

回答

1
result = pd.DataFrame() 
result['drinking crash'] = (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0) 
result['drinking driver in crash'] = ((test['crash_drinking'] == 1) & (test['crash_drugs'] == 0) 
             & (test['driver_drinking'] == 1) & (test['driver_drugs'] == 0)) 
result['drugged crash'] = (test['crash_drinking'] == 0) & (test['crash_drugs'] == 1) 
result['drugged driver in crash'] = ((test['crash_drinking'] == 0) & (test['crash_drugs'] == 1) 
            & (test['driver_drinking'] == 0) & (test['driver_drugs'] == 1)) 
result = result.astype(int) 
result['injury'] = test['injury'] 
result.groupby('injury').sum() 

resulting dataframe