2
我已經得到了以下數據框熊貓DF:沿着Pandas列的總和是否取決於MultiIndex值?
Value
time Position
1493791210867023000 0.0 21156.0
1.0 1230225.0
2.0 1628088.0
3.0 2582359.0
4.0 3388164.0
1493791210880251000 0.0 21156.0
1.0 1230225.0
2.0 1628088.0
3.0 2582359.0
4.0 3388164.0
1493791210888418000 0.0 21156.0
1.0 1230225.0
... ... ...
我怎樣纔能有效地沿着指數「位置」總結? 確切的求和公式我想實現的是:
Value Result
time Position
1493791210867023000 0.0 21156.0 Sum from 0.0 to 0.0
1.0 1230225.0 Sum from 0.0 to 1.0
2.0 1628088.0 Sum from 0.0 to 2.0
3.0 2582359.0 Sum from 0.0 to 3.0
4.0 3388164.0 Sum from 0.0 to 4.0
1493791210880251000 0.0 21156.0 Sum from 0.0 to 0.0
1.0 1230225.0 Sum from 0.0 to 1.0
2.0 1628088.0 Sum from 0.0 to 2.0
3.0 2582359.0 Sum from 0.0 to 3.0
... ... ... ...
我目前的解決方案花費的時間太長(IndexSlice是痛苦的緩慢)和我不是太肯定,至於我怎麼能總和,結果有效排序進入(新創建的)「結果」列?
import pandas as pd
import numpy as np
idx = pd.IndexSlice
res = {}
for i in range(5):
res[i] = df.loc[idx[:, :i]].groupby(level="time").sum()
df["Result"] = 0 #fill Result now with res[i] depending on position
使用
cumsum
上的新功能在0.20.1你不需要在GROUPBY了'level'爭論下去了。熊貓將讓你在列和索引上按名稱選擇。 –df.assign(Result = df.groupby(level ='time')。cumsum())的作品。謝謝。 – Bython