2016-09-24 93 views
0

如何過濾或子集數據框內的特定組(例如,下面的數據框中承認的女性)? 我想總結基於性別的入學/拒絕率。這個數據幀很小,但是如果它大得多,比如說成千上萬的行,那麼對單個值進行索引是不可能的?pandas中的數據框過濾

 Admit Gender Dept Freq 
0 Admitted Male A 512 
1 Rejected Male A 313 
2 Admitted Female A 89 
3 Rejected Female A 19 
4 Admitted Male B 353 
5 Rejected Male B 207 
6 Admitted Female B 17 
7 Rejected Female B  8 
8 Admitted Male C 120 
9 Rejected Male C 205 
10 Admitted Female C 202 
11 Rejected Female C 391 
12 Admitted Male D 138 
13 Rejected Male D 279 
14 Admitted Female D 131 
15 Rejected Female D 244 
16 Admitted Male E 53 
17 Rejected Male E 138 
18 Admitted Female E 94 
19 Rejected Female E 299 
20 Admitted Male F 22 
21 Rejected Male F 351 
22 Admitted Female F 24 
23 Rejected Female F 317 
+0

看看'groupby' – acushner

+0

Ayhan,謝謝你編輯這個問題。 –

+0

Ami,如果這是重複的,請直接告訴我原來的帖子。 –

回答

1

要過濾數據,你可以使用非常全面query功能。

# Test data 
df = DataFrame({'Admit': ['Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted'], 
     'Gender': ['Male', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female'], 
     'Freq': [512, 313, 89, 19, 353, 207, 17], 
     'Gender Dept': ['A', 'A', 'A', 'A', 'B', 'B', 'B']}) 

df.query('Admit == "Admitted" and Gender == "Female"') 

     Admit Freq Gender Gender Dept 
2 Admitted 89 Female   A 
6 Admitted 17 Female   B 

總結數據使用groupby

group = df.groupby(['Admit', 'Gender']).sum() 
print(group) 

       Freq 
Admit Gender  
Admitted Female 106 
     Male  865 
Rejected Female 19 
     Male  520 

您可以通過在創建的MultiIndex上進行子集化來過濾結果。

group.loc[('Admitted', 'Female')] 

Freq 106 
Name: (Admitted, Female), dtype: int64