您可以使用pivot_table
與fillna
爲df1
,groupby
與sum
,將total salary
的新列sum
分爲原始列salary
的sum
的df2
和最後的merge
:
#pivot df, fill NaN by 0
df1 = df.pivot_table(index='department', columns='position', values='name', aggfunc='count').fillna(0).reset_index()
#reset column name - for nicer df
df1.columns.name = None
print df1
department experienced employee normal employee
0 x 1 1
1 y 1 1
#sum by groups by column department and rename column salary
df2 = df.groupby('department')['salary'].sum().reset_index().rename(columns={'salary':'total salary'})
df2['salary_percentage'] = df2['total salary']/df['salary'].sum()
print df2
department total salary salary_percentage
0 x 45000 0.428571
1 y 60000 0.571429
print pd.merge(df1, df2, on=['department'])
department experienced employee normal employee total salary \
0 x 1 1 45000
1 y 1 1 60000
salary_percentage
0 0.428571
1 0.571429
它是如何工作的? – jezrael
答案被編輯了,因爲帶自定義函數'f'的'groupby'很慢。這個解決方案比原來快3到4倍。請檢查一下。 – jezrael
謝謝你的工作 –