2017-03-03 71 views
4

我有以下DF在大熊貓:大熊貓GROUPBY與數量,金額和平均

+---------+--------+--------------------+ 
| keyword | weight | other keywords | 
+---------+--------+--------------------+ 
| dog  | 0.12 | [cat, horse, pig] | 
| cat  | 0.5 | [dog, pig, camel] | 
| horse | 0.07 | [dog, camel, cat] | 
| dog  | 0.1 | [cat, horse]  | 
| dog  | 0.2 | [cat, horse , pig] | 
| horse | 0.3 | [camel]   | 
+---------+--------+--------------------+ 

我要執行的任務是通過關鍵字,並在同一時間的分組計數的關鍵字頻率,重量均與總結由其他關鍵字。其結果將是類似的東西:

+---------+-----------+------------+------------------------------------------------+ 
| keyword | frequency | avg weight |     sum other keywords   | 
+---------+-----------+------------+------------------------------------------------+ 
| dog  |   3 | 0.14  | [cat, horse, pig, cat, horse, cat, horse, pig] | 
| cat  |   1 | 0.5  | [dog, pig, camel]        | 
| horse |   2 | 0.185  | [dog, camel, cat, camel]      | 
+---------+-----------+------------+------------------------------------------------+ 

現在,我知道如何做到這一點在許多獨立的操作:value_counts,groupby.sum(),groupby.avg(),然後合併它。然而,這是非常低效的,我不得不做很多手動調整。

我想知道是否有可能做到這一點在一個操作?

回答

7

您可以使用agg

df = df.groupby('keyword').agg({'keyword':'size', 'weight':'mean', 'other keywords':'sum'}) 
#set new ordering of columns 
df = df.reindex_axis(['keyword','weight','other keywords'], axis=1) 
#reset index 
df = df.rename_axis(None).reset_index() 
#set new column names 
df.columns = ['keyword','frequency','avg weight','sum other keywords'] 

print (df) 
    keyword frequency avg weight \ 
0  cat   1  0.500 
1  dog   3  0.140 
2 horse   2  0.185 

           sum other keywords 
0        [dog, pig, camel] 
1 [cat, horse, pig, cat, horse, cat, horse, pig] 
2      [dog, camel, cat, camel] 
+0

尼斯sum'如何'工作在這裏的列表,以及:-) +1 – pansen

+0

@pansen - 謝謝。 – jezrael

+0

我知道我做錯了什麼!這正是我需要的!非常感謝。 – pawelty