3
test = pd.DataFrame({'injury':['A', 'B', 'B', 'A', 'A', 'C', 'A', 'B', 'A'], 'crash_drinking':[1, 1, 1, 0, 0, 0, 1, 0, 1], 'crash_drugs':[0,0,0,1,1,0,0,1,1], 'driver_drinking':[1,1,0,0,0,0,0,1,0], 'driver_drugged':[0,0,0,0,1,0,0,1,0]})
crash_drinking crash_drugs driver_drinking driver_drugged injury
0 1 0 1 0 A
1 1 0 1 0 B
2 1 0 0 0 B
3 0 1 0 0 A
4 0 1 0 1 A
5 0 0 0 0 C
6 1 0 0 0 A
7 0 1 1 1 B
8 1 1 0 0 A
複雜的過濾器我想我的輸出看起來像這樣(列名稱更改爲從上面的數據幀進行區分):大熊貓:通過GROUPBY
drinking crash drinking driver in crash drugged crash drugged driver in crash
A 2 1 2 1
B 2 1 1 0
其中第一行,"injury" = 'A'
,並且以下過濾器已到位:
「飲酒崩潰」是計數其中crash_drinking = 1
和crash_drugs = 0
;
「飲酒司機在墜毀」是其中crash_drinking = 1
,crash_drugs = 0
,driver_drinking = 1,
和driver_drugs is 0
;
「迷藥撞車」 就是crash_drinking = 0
和crash_drugs = 1;
「撞車迷藥司機」 就是crash_drinking = 0
,crash_drugs = 1
,driver_drinking = 0,
和driver_drugs = 1
。
同爲B行,但也正是"injury" = 'B'.
現在我只是有一堆的.loc過濾器設置:
test.loc[(test['injury'] == 'A') & (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0)]
test.loc[(test['injury'] == 'A') & (test['crash_drinking'] == 0) & (test['crash_drugs'] == 1)]
test.loc[(test['injury'] == 'A') & (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0) & (test['driver_drinking'] == 1) & (test['driver_drugged'] == 0)]
等等
我寧願這樣做通過groupby或.apply(),因爲我認爲這會比遍歷所有這些查詢更快。但我不確定正確的語法。也許我應該在「傷病」專欄上做一個.groupby(),然後從那裏開始......?
您的數據幀不匹配其表示(列名不同)的定義。 –
您的意思是我所需輸出中的列與輸入不同?新列與原始列不同,它們是列的組合,所以我想區分它們。我可以改變他們,但我認爲如果他們是相同的會更混亂。 – ale19
不,請查看第一行代碼以及之後打印的數據框。他們有不同的列名,這是令人困惑的。 –