篩選出沒有足夠數量的行滿足條件的組

我有以下熊貓數據框。篩選出沒有足夠數量的行滿足條件的組

import pandas as pd 

# Initialize dataframe 
df1 = pd.DataFrame(columns=['bar', 'foo']) 
df1['bar'] = ['001', '001', '001', '001', '002', '002', '003', '003', '003'] 
df1['foo'] = [-4, -3, 2, 3, -3, -2, 0, 1, 2] 
>>> print df1 
    bar foo 
0 001 -4 
1 001 -3 
2 001 2 
3 001 3 
4 002 -3 
5 002 -2 
6 003 0 
7 003 1 
8 003 2

考慮以下閾值和參數。

# Provide threshold and number of entries above and below threshold 
threshold = 0 
n_below = 2 
n_above = 2

我想創建篩選出的bar一定值的數據幀。 bar我想過濾掉的是：如果它至少沒有n_below的值foo小於threshold和n_above的值foo大於threshold。

對於上面的例子：

組bar = 001不會被過濾掉，由於用於bar = 001有小於threshold = 0至少n_below = 2條目foo和至少n_above = 2條目foo比threshold = 0更大。
該組bar = 002將被過濾掉，因爲對於bar = 002，至少有n_above = 2條目的foo大於threshold = 0。
組bar = 003將被過濾掉，因爲對於bar = 003，至少有n_below = 2條目foo小於threshold = 0。

所需的輸出將是如下：

# Desired output 
    bar foo 
0 001 -4 
1 001 -3 
2 001 2 
3 001 3

我相信這可以用的GroupBy和.count()來完成，但我一直無法得到一個可行的解決方案。我認識到，編寫一個解決方案可能會更清潔，分兩步進行：1）首先篩選以滿足n_below條件; 2）然後過濾以符合n_above條件。

來源

2017-02-15 Adam

您可以使用groupby和filter方法。

threshold = 0 
n_below = 2 
n_above = 2 
def filter_function(g): 
    '''Called by filter, g is the grouped dataframe''' 
    l = g['foo'] 
    return (sum([x < threshold for x in l]) >= n_below 
      and sum([x > threshold for x in l]) >= n_above) 

df.groupby('bar').filter(filter_function) 

# gives 
    bar foo 
0 1 -4 
1 1 -3 
2 1 2 
3 1 3

見Pandas: Filtration

來源

2017-02-15 05:14:50

我覺得有溶液之一：

threshold = 1 
n_below = 2 
n_above = 2 

df1.set_index('bar').loc[ \ 
    df1.groupby('bar')\ 
     .apply(lambda df_sub: \ 
        (df_sub['foo']<threshold).sum()>=n_below \ 
       and (df_sub['foo']>threshold).sum()>=n_above)] \ 
.reset_index('bar')

，並返回

來源

2017-02-15 05:39:58 heyu91

idx = df1.groupby('bar').apply(lambda x: (sum(x['foo'] < threshold) >= n_below) & (sum(x['foo'] > threshold) >= n_above)) 

print df1.set_index('bar')[idx].reset_index() 

    bar foo 
0 001 -4 
1 001 -3 
2 001 2 
3 001 3

來源

2017-02-15 05:40:13 su79eu7k

篩選出沒有足夠數量的行滿足條件的組

回答

相關問題