我建議創建一個MultiIndex作爲其列的DataFrame。這裏沒有辦法在這裏使用循環遍歷你的窗口。由此產生的表格將很容易索引,易於用pd.read_csv
閱讀。用適當形狀的np.empty
初始化一個空的DataFrame並使用.loc
來分配其值。
import numpy as np
import pandas as pd
np.random.seed(123)
df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')
windows = [5, 15, 30, 45]
stats = ['mean', 'std']
cols = pd.MultiIndex.from_product([windows, df.columns, stats],
names=['window', 'feature', 'metric'])
df2 = pd.DataFrame(np.empty((df.shape[0], len(cols))), columns=cols,
index=df.index)
for window in windows:
df2.loc[:, window] = df.rolling(window=window).agg(stats).values
現在你有一個結果df2
與原始對象具有相同的索引。它有三個列級別:第一個是窗口,第二個是來自原始框架的列,第三個是統計。
print(df2.shape)
(100, 24)
這可以很容易地檢查值特定滾動窗口:
print(df2[5]) # Rolling window = 5
feature col0 col1 col2
metric mean std mean std mean std
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 -0.87879 1.45348 -0.26559 0.71236 0.53233 0.89430
.. ... ... ... ... ... ...
95 -0.44231 1.02552 -1.22138 0.45140 -0.36440 0.95324
96 -0.58638 1.10246 -0.90165 0.79723 -0.44543 1.00166
97 -0.70564 0.85711 -0.42644 1.07174 -0.44766 1.00284
98 -0.95702 1.01302 -0.03705 1.05066 0.16437 1.32341
99 -0.57026 1.10978 0.08730 1.02438 0.39930 1.31240
print(df2[5]['col0']) # Rolling window = 5, stats of col0 only
metric mean std
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 -0.87879 1.45348
.. ... ...
95 -0.44231 1.02552
96 -0.58638 1.10246
97 -0.70564 0.85711
98 -0.95702 1.01302
99 -0.57026 1.10978
print(df2.loc[:, (5, slice(None), 'mean')]) # Rolling window = 5,
# means of each column
period 5
feature col0 col1 col2
metric mean mean mean
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 -0.87879 -0.26559 0.53233
.. ... ... ...
95 -0.44231 -1.22138 -0.36440
96 -0.58638 -0.90165 -0.44543
97 -0.70564 -0.42644 -0.44766
98 -0.95702 -0.03705 0.16437
99 -0.57026 0.08730 0.39930
並最後做出一個索引的數據幀,這裏的一些缺憾使用itertools
。
df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')
import itertools
means = [col + '_mean' for col in df.columns]
stds = [col + '_std' for col in df.columns]
iters = [iter(means), iter(stds)]
iters = list(it.__next__() for it in itertools.cycle(iters))
iters = list(itertools.product(iters, [str(win) for win in windows]))
iters = ['_'.join(it) for it in iters]
df2 = [df.rolling(window=window).agg(stats).values for window in windows]
df2 = pd.DataFrame(np.concatenate(df2, axis=1), columns=iters,
index=df.index)
因此,您希望結果像每個列和每個時期的平均值和標準偏差的原始數據框一樣嗎?即如果您有3個原始列,您的新框架將是3 x 2 x 4 = 24列?你認爲多索引數據框或數據框字典是否有意義? –
@BradSolomon,你的號碼是正確的。越簡單越好。該數據框將提供一個csv文件,我將在那裏存儲數據供以後檢索 – Diego