2017-08-15 20 views
2

都有點吃力,以申請.nlargest()我groupedby數據只能由每個索引總收入顯示最大10 GROUPBY [0]nlargest與多指標,多AGG列

Groupedby數據看起來是這樣的:

Data currently

當我運行:

grp_data.n_largest(10,'GrossRevenue_GBP') 

似乎並沒有爲我工作,完整的代碼片段如下:

tmean = lambda x :stats.trim_mean(x, 0.1) 

data = data.loc[(data['YYYY'] == 2016)&(data['New_category_ID'] != 0)] 

grp_data = data.groupby(['New_category','CDI_CUS_NM'])['GrossRevenue_GBP', 
'OrderCount', 
'% Rev', 
'MOVC_GBP', 
'Average order size'] 
.aggregate({'GrossRevenue_GBP':np.sum, 'OrderCount':np.sum,'% Rev': np.sum,'MOVC_GBP': tmean ,'Average order size': tmean }) 
.nlargest(10,'GrossRevenue_GBP') 


grp_data['Country'] = 'EU' 


key1 = grp_data.index.labels[0] 
key2 = grp_data['GrossRevenue_GBP'].rank(ascending=False) 
sorter = np.lexsort((key2, key1)) 

grp_data = grp_data.take(sorter) 


grp_data = grp_data[['% Rev','GrossRevenue_GBP', 'MOVC_GBP','Average order size','OrderCount','Country']] 

真的很感謝一些幫助。

感謝,

回答

1

我想你需要groupby第一多指標水平,然後用nlargest應用功能:

grp_data = data.groupby(['New_category','CDI_CUS_NM']) 
       .aggregate({'GrossRevenue_GBP':np.sum, 
          'OrderCount':np.sum, 
          '% Rev': np.sum, 
          'MOVC_GBP': tmean , 
          'Average order size': tmean }) 

df = grp_data.groupby('New_category') 
      .apply(lambda x: x.nlargest(1,'GrossRevenue_GBP')) 
      .reset_index(level=0, drop=True) 
+1

謝謝,完美的作品:) –

+0

超,祝你好運! – jezrael