2016-09-26 38 views
3

我是python和使用熊貓的新手。查詢熊貓df過濾列不是南的行

我想查詢一個數據幀並過濾其中一列不是NaN的行。

我曾嘗試:

a=dictionarydf.label.isnull() 

但填充了truefalse。 試過這種

dictionarydf.query(dictionarydf.label.isnull()) 

但給了我預期的錯誤

樣本數據:

 reference_word   all_matching_words label review 
0   account    fees - account NaN  N 
1   account   mobile - account NaN  N 
2   account   monthly - account NaN  N 
3 administration delivery - administration NaN  N 
4 administration  fund - administration NaN  N 
5   advisor    fees - advisor NaN  N 
6   advisor   optimum - advisor NaN  N 
7   advisor    sub - advisor NaN  N 
8    aichi   delivery - aichi NaN  N 
9    aichi    pref - aichi NaN  N 
10   airport    biz - airport travel  N 
11   airport    cfo - airport travel  N 
12   airport   cfomtg - airport travel  N 
13   airport   meeting - airport travel  N 
14   airport   summit - airport travel  N 
15   airport    taxi - airport travel  N 
16   airport   train - airport travel  N 
17   airport   transfer - airport travel  N 
18   airport    trip - airport travel  N 
19    ais    admin - ais NaN  N 
20    ais    alpine - ais NaN  N 
21    ais     fund - ais NaN  N 
22  allegiance  custody - allegiance NaN  N 
23  allegiance   fees - allegiance NaN  N 
24   alpha    late - alpha NaN  N 
25   alpha    meal - alpha NaN  N 
26   alpha    taxi - alpha NaN  N 
27   alpine    admin - alpine NaN  N 
28   alpine    ais - alpine NaN  N 
29   alpine    fund - alpine NaN  N 

我要過濾的數據,其中標籤不是NaN的

預期輸出:

 reference_word   all_matching_words label review 
0   airport    biz - airport travel  N 
1   airport    cfo - airport travel  N 
2   airport   cfomtg - airport travel  N 
3   airport   meeting - airport travel  N 
4   airport   summit - airport travel  N 
5   airport    taxi - airport travel  N 
6   airport   train - airport travel  N 
7   airport   transfer - airport travel  N 
8   airport    trip - airport travel  N 

回答

3

您可以使用dropna

df = df.dropna(subset=['label']) 

print (df) 
    reference_word all_matching_words label review 
10  airport  biz - airport travel  N 
11  airport  cfo - airport travel  N 
12  airport cfomtg - airport travel  N 
13  airport meeting - airport travel  N 
14  airport summit - airport travel  N 
15  airport  taxi - airport travel  N 
16  airport  train - airport travel  N 
17  airport transfer - airport travel  N 
18  airport  trip - airport travel  N 

另一個解決方案 - boolean indexingnotnull

df = df[df.label.notnull()] 

print (df) 
    reference_word all_matching_words label review 
10  airport  biz - airport travel  N 
11  airport  cfo - airport travel  N 
12  airport cfomtg - airport travel  N 
13  airport meeting - airport travel  N 
14  airport summit - airport travel  N 
15  airport  taxi - airport travel  N 
16  airport  train - airport travel  N 
17  airport transfer - airport travel  N 
18  airport  trip - airport travel  N 
+0

感謝您的快速回答:) @jezrael上解決了這個問題。我選擇布爾索引,因爲我不想刪除行,我也不需要創建一個重複的數據框。這兩個解決方案都很完美 – Dileep