重新取樣熊貓數據幀與係數

我有以下的列的數據幀：{'day','measurement'}重新取樣熊貓數據幀與係數

而且有可能在一天內多次測量（或根本沒有測量）

例如：

day  | measurement 
1  |  20.1 
1  |  20.9 
3  |  19.2 
4  |  20.0 
4  |  20.2

和係數的數組： coef={-1:0.2, 0:0.6, 1:0.2}

我的目標是重新採樣d ata並使用係數求平均值（缺失的數據應該省略）。

這是我寫來計算

window=[-1,0,-1] 
df['resampled_measurement'][df['day']==d]=[coef[i]*df['measurement'][df['day']==d-i].mean() for i in window if df['measurement'][df['day']==d-i].shape[0]>0].sum() 
df['resampled_measurement'][df['day']==d]/=[coef[i] for i in window if df['measurement'][df['day']==d-i].shape[0]>0].sum()

對於上面的示例代碼，輸出應該是：

Day measurement 
1 20.500 
2 19.850 
3 19.425 
4 19.875

的問題是，代碼運行永遠和我很確定有更好的方法來重新採樣係數。

任何意見將不勝感激！

來源

2015-04-20 Uri Goren

能否請你幫我瞭解的相關性如何轉化到高於預期的輸出？我的理解是，例如，在第4天，你會希望'（0.2 * 19.2 + 0.6 * 20.1）/ 0.8'這是'19.875'，而不是'19.97'。如果你能在第4天或第3天計算時走過，那會有幫助。 –

我的錯誤，謝謝@SAnand –

@UriGoren第2,3天的測量結果如預期的那樣準確？我想，你應該更新這些！ – Zero

這裏是一個可能的解決方案，你在找什麼：

 # This is your data 
In [2]: data = pd.DataFrame({ 
    ...:  'day': [1, 1, 3, 4, 4], 
    ...:  'measurement': [20.1, 20.9, 19.2, 20.0, 20.2] 
    ...: }) 

     # Pre-compute every day's average, filling the gaps 
In [3]: measurement = data.groupby('day')['measurement'].mean() 

In [4]: measurement = measurement.reindex(pd.np.arange(data.day.min(), data.day.max() + 1)) 

In [5]: coef = pd.Series({-1: 0.2, 0: 0.6, 1: 0.2}) 

     # Create a matrix with the time-shifted measurements 
In [6]: matrix = pd.DataFrame({key: measurement.shift(key) for key, val in coef.iteritems()}) 

In [7]: matrix 
Out[7]: 
     -1  0  1 
day 
1  NaN 20.5 NaN 
2 19.2 NaN 20.5 
3 20.1 19.2 NaN 
4  NaN 20.1 19.2 

     # Take a weighted average of the matrix 
In [8]: (matrix * coef).sum(axis=1)/(matrix.notnull() * coef).sum(axis=1) 
Out[8]: 
day 
1 20.500 
2 19.850 
3 19.425 
4 19.875 
dtype: float64

來源

2015-04-20 14:39:53

重新取樣熊貓數據幀與係數

回答

相關問題