我有一個替代的解決方案比一個稍長其中之前已經發布,但我認爲它可能更容易理解日期列轉換函數內部發生了什麼,以及也輸出格式是一個位清潔器:
import numpy as np
import pandas as pd
from datetime import date
# Build data
prd = [1, 2, 3, 4, 1, 2]
grp = ['A', 'A', 'A', 'A', 'B', 'B']
yr = [2010, 2010, 2010, 2010, 2000, 2000]
mth = [7, 7, 7, 7, 8, 8]
day = [1, 13, 13, 21, 20, 15]
dt = [date(y, m, d) for y, m, d in zip(yr, mth, day)]
# Create data frame
df = pd.DataFrame({'Period': prd, 'Group': grp, 'Dates': dt},
columns=['Period', 'Group', 'Dates'])
# Transformation function for the date column
def f(ser):
v = ser.values
# Get time difference in days
delta = [float((ii-v[0]).days) for ii in v]
# Get number of items to divide by
dv = np.arange(len(delta))+1
# Get cumulative average
cumavg = [nm/dm for nm, dm in zip(delta, dv)]
# Create output pandas Series object and return it
out = pd.Series(cumavg, index=ser.index)
return out
# Apply the transformation function to the Dates column
dfappend = pd.DataFrame({'Cum_Avg': df.groupby("Group").Dates.apply(f)})
# Delete the Dates column
del df['Dates']
# Merge to create the revised data frame
df = pd.merge(df, dfappend, left_index=True, right_index=True)
print(df)
的輸出是:
Period Group Cum_Avg
0 1 A 0.0
1 2 A 6.0
2 3 A 4.0
3 4 A 5.0
4 1 B 0.0
5 2 B -2.5
不應2 B值是-5/2?您最終會在幾天內尋找平均差異(作爲浮動)? – Jeff