2017-05-25 79 views
2

我已經得到了以下數據框熊貓DF:沿着Pandas列的總和是否取決於MultiIndex值?

        Value 
time     Position 
1493791210867023000 0.0   21156.0 
        1.0   1230225.0 
        2.0   1628088.0 
        3.0   2582359.0 
        4.0   3388164.0 
1493791210880251000 0.0   21156.0 
        1.0   1230225.0 
        2.0   1628088.0 
        3.0   2582359.0 
        4.0   3388164.0 
1493791210888418000 0.0   21156.0 
        1.0   1230225.0 
...     ...   ... 

我怎樣纔能有效地沿着指數「位置」總結? 確切的求和公式我想實現的是:

        Value  Result 
time     Position 
1493791210867023000 0.0   21156.0 Sum from 0.0 to 0.0 
        1.0   1230225.0 Sum from 0.0 to 1.0 
        2.0   1628088.0 Sum from 0.0 to 2.0 
        3.0   2582359.0 Sum from 0.0 to 3.0 
        4.0   3388164.0 Sum from 0.0 to 4.0 
1493791210880251000 0.0   21156.0 Sum from 0.0 to 0.0 
        1.0   1230225.0 Sum from 0.0 to 1.0 
        2.0   1628088.0 Sum from 0.0 to 2.0 
        3.0   2582359.0 Sum from 0.0 to 3.0 
...     ...   ...   ... 

我目前的解決方案花費的時間太長(IndexSlice是痛苦的緩慢)和我不是太肯定,至於我怎麼能總和,結果有效排序進入(新創建的)「結果」列?

import pandas as pd 
import numpy as np 
idx = pd.IndexSlice 
res = {} 
for i in range(5): 
    res[i] = df.loc[idx[:, :i]].groupby(level="time").sum() 
df["Result"] = 0 #fill Result now with res[i] depending on position 

回答

4

儘量在一個groupby

df.assign(Result=df.groupby(level='time').Value.cumsum()) 
# suggested by @ScottBoston for pandas 0.20.1+ 
# df.assign(Result=df.groupby('time').Value.cumsum()) 

            Value  Result 
time    Position      
1493791210867023000 0.0   21156.0 21156.0 
        1.0  1230225.0 1251381.0 
        2.0  1628088.0 2879469.0 
        3.0  2582359.0 5461828.0 
        4.0  3388164.0 8849992.0 
1493791210880251000 0.0   21156.0 21156.0 
        1.0  1230225.0 1251381.0 
        2.0  1628088.0 2879469.0 
        3.0  2582359.0 5461828.0 
        4.0  3388164.0 8849992.0 
1493791210888418000 0.0   21156.0 21156.0 
        1.0  1230225.0 1251381.0 
+3

使用cumsum上的新功能在0.20.1你不需要在GROUPBY了'level'爭論下去了。熊貓將讓你在列和索引上按名稱選擇。 –

+0

df.assign(Result = df.groupby(level ='time')。cumsum())的作品。謝謝。 – Bython