2017-07-27 51 views
2

當試圖在數據幀計數用類似 '那種' 行:熊貓,按計數分組,並添加到原始數據框的計數?

import pandas as pd 

items = [('aaa','aaa text 1'), ('aaa','aaa text 2'), ('aaa','aaa text 3'), 
     ('bb', 'bb text 1'), ('bb', 'bb text 2'), ('bb', 'bb text 3'), 
     ('bb', 'bb text 4'), 
     ('cccc','cccc text 1'), ('cccc','cccc text 2'), 
     ('dd', 'dd text 1'), 
     ('e', 'e text 1'), 
     ('fff', 'fff text 1'), 
     ] 

df = pd.DataFrame(items, columns=['kind', 'msg']) 
df 

    kind msg 
0 aaa  aaa text 1 
1 aaa  aaa text 2 
2 aaa  aaa text 3 
3 bb  bb text 1 
4 bb  bb text 2 
5 bb  bb text 3 
6 bb  bb text 4 
7 cccc cccc text 1 
8 cccc cccc text 2 
9 dd  dd text 1 
10 e  e text 1 
11 fff  fff text 1 

此代碼:

df = df[['kind']].groupby(['kind'])['kind'] \ 
         .count() \ 
         .reset_index(name='count') \ 
         .sort_values(['count'], ascending=False) \ 
         .head(5) 

df 

,導致:

kind  count 
    0 aaa 1 
    1 bb 1 
    2 cccc 1 
    3 dd 1 
    4 e  1 

然而,一個人如何可以得到包含所有列的數據框與原始列一樣,加上「計數」列?所以結果應該按這個順序列'kind','msg','count'?

此外,如何按count計數降序排列此結果數據框?

回答

4

IIUC

In [247]: df['count'] = df.groupby('kind').transform('count') 

In [248]: df 
Out[248]: 
    kind   msg count 
0 aaa aaa text 1  3 
1 aaa aaa text 2  3 
2 aaa aaa text 3  3 
3  bb bb text 1  4 
4  bb bb text 2  4 
5  bb bb text 3  4 
6  bb bb text 4  4 
7 cccc cccc text 1  2 
8 cccc cccc text 2  2 
9  dd dd text 1  1 
10  e  e text 1  1 
11 fff fff text 1  1 

排序:

In [249]: df.sort_values('count', ascending=False) 
Out[249]: 
    kind   msg count 
3  bb bb text 1  4 
4  bb bb text 2  4 
5  bb bb text 3  4 
6  bb bb text 4  4 
0 aaa aaa text 1  3 
1 aaa aaa text 2  3 
2 aaa aaa text 3  3 
7 cccc cccc text 1  2 
8 cccc cccc text 2  2 
9  dd dd text 1  1 
10  e  e text 1  1 
11 fff fff text 1  1