2016-02-15 72 views
4

我具有由日期熊貓求和重複指數與總和

transactions_ind 
Out[25]: 
        Ticker  Transaction Number_of_units  Price 
Date                 
2012-10-11 ROG VX Equity    Buy   12000 182.00000 
2012-10-16 ROG VX Equity   Sell   -5000 184.70000 
2012-11-16 ROG VX Equity   Sell   -5000 175.51580 
2012-12-07 ROG VX Equity    Buy    5000 184.90000 
2012-12-11 ROG VX Equity   Sell   -3000 188.50000 
2012-12-11 ROG VX Equity Reversal: Sell    3000 188.50000 
2012-12-11 ROG VX Equity   Sell   -3000 188.50000 
2012-12-11 ROG VX Equity Reversal: Sell    3000 188.50000 
2012-12-11 ROG VX Equity   Sell   -3000 188.50000 
2012-12-20 ROG VX Equity   Sell   -5000 185.80000 

我要總結在重複索引值(二零一二年十二月十一日)而是僅僅在列「Number_of_units」索引的數據幀。

transactions_ind 
Out[25]: 
        Ticker  Transaction Number_of_units  Price 
Date                 
2012-10-11 ROG VX Equity    Buy   12000 182.00000 
2012-10-16 ROG VX Equity   Sell   -5000 184.70000 
2012-11-16 ROG VX Equity   Sell   -5000 175.51580 
2012-12-07 ROG VX Equity    Buy    5000 184.90000 
2012-12-11 ROG VX Equity   Sell   -3000 188.50000 
2012-12-20 ROG VX Equity   Sell   -5000 185.80000 

使用

transactions_ind.groupby(transactions_ind.index).sum() 

刪除列「北京時間」和「交易」,因爲這些都充滿了非數字值。當我總結「Number_of_units」列時,我也不知道如何處理「Transactions」列中的不同字符串。希望熊貓存在一條線。謝謝你的幫助!

回答

6

您可以使用aggfirstsum

df = df.groupby(df.index).agg({'Ticker': 'first', 
           'Transaction': 'first', 
           'Number_of_units':sum, 
           'Price': 'first'}) 
#reorder columns 
df = df[['Ticker','Transaction','Number_of_units','Price']] 
print df 
        Ticker Transaction Number_of_units  Price 
Date                
2012-10-11 ROG VX Equity   Buy   12000 182.0000 
2012-10-16 ROG VX Equity  Sell   -5000 184.7000 
2012-11-16 ROG VX Equity  Sell   -5000 175.5158 
2012-12-07 ROG VX Equity   Buy    5000 184.9000 
2012-12-11 ROG VX Equity  Sell   -3000 188.5000 
2012-12-20 ROG VX Equity  Sell   -5000 185.8000 
+0

愛它!謝謝 – Pat