2017-03-23 56 views
1

我有一個DataFrame包含一組在不同時間測量的值。我想將每天的價值規範化爲一個。如何才能做到這一點?在熊貓數據框中,每天的值如何歸一化?

具體而言,我有以下形式的數據:

       value 
datetime       
2017-03-08 14:36:06.616166 1002.49 
2017-03-08 15:06:07.661818 992.68 
2017-03-08 15:36:08.597443 984.34 
2017-03-08 16:06:09.265451 989.32 
2017-03-08 16:36:10.581452 1004.00 
2017-03-08 17:06:11.269434 1003.97 
2017-03-08 17:36:12.117443 994.80 
2017-03-08 18:06:12.809445 994.17 
2017-03-08 18:36:14.029444 997.93 
2017-03-08 19:06:14.654631 989.65 
2017-03-08 19:36:15.413438 991.14 
2017-03-08 20:06:16.145432 984.65 
2017-03-08 20:36:17.265443 993.30 
2017-03-08 21:06:18.117434 981.18 
2017-03-08 21:36:19.165447 987.64 
2017-03-08 22:06:19.909443 985.26 
2017-03-08 22:36:20.569442 980.40 
2017-03-08 23:06:21.197446 988.59 
2017-03-08 23:36:21.989448 984.59 
2017-03-09 00:06:22.665448 983.91 
2017-03-09 00:36:23.281681 993.65 
2017-03-09 01:06:23.857440 986.69 
2017-03-09 01:36:24.441713 984.04 
2017-03-09 02:06:25.117453 989.92 
2017-03-09 02:36:25.953449 978.82 
2017-03-09 03:06:26.521704 987.42 
2017-03-09 03:36:27.157448 996.66 
2017-03-09 04:06:27.725445 996.66 
2017-03-09 04:36:29.201442 996.66 
2017-03-09 05:06:29.765443 989.82 
...        ... 
2017-03-22 20:16:24.007637 833.74 
2017-03-22 20:46:24.583127 834.69 
2017-03-22 21:16:25.217536 829.66 

我想分別正常化的所有值的2017年3月8日,2017年3月9日等,並添加這些標準化值作爲新列。

的值列表一個簡單的標準化功能如下:

def normalize(x, summation = None): 
    if summation is None: 
     summation = sum(x) # normalize to unity 
    return [element/summation for element in x] 

因此,對於2017年3月8日,歸一化值將如下所示:

       value value_day_normalized 
datetime             
2017-03-08 14:36:06.616166 1002.49 0.0532386976171 
2017-03-08 15:06:07.661818 992.68 0.0527177232197 
2017-03-08 15:36:08.597443 984.34 0.0522748153223 
2017-03-08 16:06:09.265451 989.32 0.0525392855057 
2017-03-08 16:36:10.581452 1004.00 0.0533188883755 
2017-03-08 17:06:11.269434 1003.97 0.0533172951817 
2017-03-08 17:36:12.117443 994.80 0.0528303089203 
2017-03-08 18:06:12.809445 994.17 0.0527968518489 
2017-03-08 18:36:14.029444 997.93 0.052996532148 
2017-03-08 19:06:14.654631 989.65 0.0525568106383 
2017-03-08 19:36:15.413438 991.14 0.0526359392674 
2017-03-08 20:06:16.145432 984.65 0.0522912783257 
2017-03-08 20:36:17.265443 993.30 0.0527506492265 
2017-03-08 21:06:18.117434 981.18 0.0521069989007 
2017-03-08 21:36:19.165447 987.64 0.0524500666486 
2017-03-08 22:06:19.909443 985.26 0.0523236732678 
2017-03-08 22:36:20.569442 980.40 0.0520655758599 
2017-03-08 23:06:21.197446 988.59 0.052500517788 
2017-03-08 23:36:21.989448 984.59 0.0522880919379 

哪有這樣的事情可以完成嗎?我有一種感覺,它可能涉及使用DataFrame方法groupby,但我不知道我應該如何處理這個問題。

回答

2

您可以div IDE Series與相同長度的原始df通過groupbyGroupBy.transform

df['value_day_normalized'] = df['value'].div(df.groupby(pd.Grouper(freq='D'))['value'] 
               .transform('sum')) 
print (df) 
          value value_day_normalized 
datetime             
2017-03-08 14:36:06.616166 1002.49    0.053239 
2017-03-08 15:06:07.661818 992.68    0.052718 
2017-03-08 15:36:08.597443 984.34    0.052275 
2017-03-08 16:06:09.265451 989.32    0.052539 
2017-03-08 16:36:10.581452 1004.00    0.053319 
2017-03-08 17:06:11.269434 1003.97    0.053317 
2017-03-08 17:36:12.117443 994.80    0.052830 
2017-03-08 18:06:12.809445 994.17    0.052797 
2017-03-08 18:36:14.029444 997.93    0.052997 
... 
... 

resampleResampler.transform另一種解決方案:

df['value_day_normalized'] = df['value'].div(df.resample('D')['value'].transform('sum')) 
print (df) 
           value value_day_normalized 
datetime             
2017-03-08 14:36:06.616166 1002.49    0.053239 
2017-03-08 15:06:07.661818 992.68    0.052718 
2017-03-08 15:36:08.597443 984.34    0.052275 
2017-03-08 16:06:09.265451 989.32    0.052539 
2017-03-08 16:36:10.581452 1004.00    0.053319 
2017-03-08 17:06:11.269434 1003.97    0.053317 
2017-03-08 17:36:12.117443 994.80    0.052830 
... 
... 
相關問題