2016-02-16 26 views
2

我有一個數據幀數據框中用戶定義的格式

name salary department    position 
    a 25000   x  normal employee 
    b 50000   y  normal employee 
    c 10000   y experienced employee 
    d 20000   x experienced employee 

我想獲得像下面的格式結果:

dept total salary salary_percentage count_normal_employee  count_experienced_employee 
x  55000   55000/115000     1        1 
y  60000   60000/115000     1        1 

回答

3

您可以使用pivot_tablefillnadf1groupbysum ,將total salary的新列sum分爲原始列salarysumdf2和最後的merge

#pivot df, fill NaN by 0 
df1 = df.pivot_table(index='department', columns='position', values='name', aggfunc='count').fillna(0).reset_index() 
#reset column name - for nicer df 
df1.columns.name = None 
print df1 
    department experienced employee normal employee 
0   x      1    1 
1   y      1    1 

#sum by groups by column department and rename column salary 
df2 = df.groupby('department')['salary'].sum().reset_index().rename(columns={'salary':'total salary'}) 

df2['salary_percentage'] = df2['total salary']/df['salary'].sum() 
print df2 
    department total salary salary_percentage 
0   x   45000   0.428571 
1   y   60000   0.571429 

print pd.merge(df1, df2, on=['department']) 
    department experienced employee normal employee total salary \ 
0   x      1    1   45000 
1   y      1    1   60000 

    salary_percentage 
0   0.428571 
1   0.571429 
+0

它是如何工作的? – jezrael

+0

答案被編輯了,因爲帶自定義函數'f'的'groupby'很慢。這個解決方案比原來快3到4倍。請檢查一下。 – jezrael

+0

謝謝你的工作 –