2017-04-19 78 views
2

假設數據框如下:如何選擇具備一定條件從熊貓數據框中行

id class count 
0  A  2 
0  B  2 
0  C  2 
0  D  1 
1  A  3 
1  B  3 
1  E  2 
2  D  4 
2  F  2 

每個ID,我想找到其計爲最大時的等級。如果多個類具有相同的計數,則將它們合併爲一行。對於上面的例子,結果應該如下:

id  class count 
0  A,B,C  2 
1  A,B  3 
2  D   4 

如何在pandas中使用語句來實現這個功能?

回答

3

transformaggregate

df = df[g['count'].transform('max').eq(df['count'])] 
print (df) 
    id class count 
0 0  A  2 
1 0  B  2 
2 0  C  2 
4 1  A  3 
5 1  B  3 
7 2  D  4 

df = df.groupby('id').agg({'class':','.join, 'count':'first'}).reset_index() 
print (df) 
    id class count 
0 0 A,B,C  2 
1 1 A,B  3 
2 2  D  4 

具有自定義功能的另一個解決方案:

def f(x): 
    x = x[x['count'] == x['count'].max()] 
    return (pd.Series([','.join(x['class'].values.tolist()), x['count'].iat[0]], 
         index=['class','count'])) 

df = df.groupby('id').apply(f).reset_index() 
print (df) 
    id class count 
0 0 A,B,C  2 
1 1 A,B  3 
2 2  D  4 
3

選項1

s = df.set_index(['id', 'class'])['count'] 
s1 = s[s.eq(s.groupby(level=0).max())].reset_index() 
s1.groupby(
    ['id', 'count'] 
)['class'].apply(list).reset_index()[['id', 'class', 'count']] 

    id  class count 
0 0 [A, B, C] 2.0 
1 1  [A, B] 3.0 
2 2  [D] 4.0 

選項2

d1 = df.set_index(['id', 'class'])['count'].unstack() 

v = d1.values 
m = np.nanmax(v, 1) 
t = v == m[:, None] 
pd.DataFrame({ 
     'id': d1.index, 
     'class': [list(s) for s in t.dot(d1.columns)], 
     'count': m 
    })[['id', 'class', 'count']] 

    id  class count 
0 0 [A, B, C] 2.0 
1 1  [A, B] 3.0 
2 2  [D] 4.0