2016-01-20 35 views
1

我有一個看起來像這樣的(實際上有35列和更多的元組,但下面是相關列的數據幀:分組的大熊貓,同時保留元組

 leg_side leg_quantity expiration product change_type 
0  None   None  None  ZQ  inserted 
1  None   None  None  HG  inserted 
2  None   None  None  PL  inserted 
3  None   None  None  SI  inserted 
4  None   None  None  ZQ  inserted 
5  None   None  None  PL  inserted 
6  None   None  None  ZW  inserted 
7  None   None  None  SI  inserted 
8  None   None  None  ZQ  updated 
9  None   None  None  SI  inserted 
10  None   None  None  ZC  updated 
..  ...   ...  ...  ...   ... 
970  None   None  None  OZ  inserted 
971  None   None  None  OZ  deleted 
972  None   None  None  OZ  updated 
973  None   None  None  ZC  inserted 
974  None   None  None  OZ  inserted 
975  None   None  None  ZC  inserted 
976  None   None  None  OZ  inserted 

現在我想要做什麼是組通過該產品,但不一定在SQL意義上我想要做的就是聚合與同類產品的所有元組在一起,並通過change_type做一個子聚合,得到這樣的DF:

 leg_side leg_quantity expiration product change_type 
0  None   None  None  ZQ  inserted 
4  None   None  None  ZQ  inserted 
8  None   None  None  ZQ  updated 
1  None   None  None  HG  inserted 
2  None   None  None  PL  inserted 
5  None   None  None  PL  inserted 
3  None   None  None  SI  inserted 
7  None   None  None  SI  inserted 
9  None   None  None  SI  inserted 
6  None   None  None  ZW  inserted 
... 
973  None   None  None  ZC  inserted 
975  None   None  None  ZC  inserted 
10  None   None  None  ZC  updated 
970  None   None  None  OZ  inserted 
974  None   None  None  OZ  inserted 
976  None   None  None  OZ  inserted 
972  None   None  None  OZ  updated 
971  None   None  None  OZ  deleted 

的上面的數據框架被組織成具有相同產品名稱的所有元組在一起,然後將具有相同更改類型的那些組中的所有元組分組在一起(優選以插入,更新,刪除的順序)。如果我做熊貓groupby(),那麼元組將被消除。我只是想分組排序的感覺。我怎樣才能做到這一點?

回答

1

您可以使用Categoricalset自定義順序。然後groupby帶分類的數據:

df['change_type'] = df['change_type'].astype('category') 
            .cat 
            .set_categories(["inserted","updated","deleted"], ordered=True) 

df = df.groupby('product').apply(lambda x: x.sort_values('change_type')) 
          .reset_index(drop=True) 
print df 

    leg_side leg_quantity expiration product change_type 
0  None   None  None  HG inserted 
1  None   None  None  OZ inserted 
2  None   None  None  OZ inserted 
3  None   None  None  OZ inserted 
4  None   None  None  OZ  updated 
5  None   None  None  OZ  deleted 
6  None   None  None  PL inserted 
7  None   None  None  PL inserted 
8  None   None  None  SI inserted 
9  None   None  None  SI inserted 
10  None   None  None  SI inserted 
11  None   None  None  ZC inserted 
12  None   None  None  ZC inserted 
13  None   None  None  ZC  updated 
14  None   None  None  ZQ inserted 
15  None   None  None  ZQ inserted 
16  None   None  None  ZQ  updated 
17  None   None  None  ZW inserted