2017-08-25 71 views
0

- 編輯我注意到我輸入的時間不是我的意圖。我將12點後的時間轉換爲24小時制。但是,unutbu的答案應該很清楚。熊貓組按時間與指定的開始時間

- 2nd Edit。我改變了數據以作出更好的例子。

以下是按日期索引的時間序列。我想從start_datetime開始聚合,並根據下面的timedelta(9.5小時= 34200秒)繼續聚合。

def main(): 

    # start_datetime = datetime.datetime(2013, 1, 1, 8) 
    # end_datetime = datetime.datetime(2013, 1, 1, 5, 30) 
    s = pd.Series(
     np.arange(2, 10), 
     pd.to_datetime([ 
      '20130101 7:34:04', '20130101 8:34:08', '20130101 10:34:08', 
      '20130101 12:34:15', '20130101 13:34:28', '20130101 12:34:54', 
      '20130101 14:34:55', '20130101 17:29:12'])) 

    print(s) 
    bar_size = datetime.timedelta(seconds=60*60*9.5) 
    time_group = pd.Grouper(
     freq=pd.Timedelta(bar_size), closed='left', label='right') 
    foobar = s.groupby(time_group).agg(np.sum) 
    print(foobar) 

if __name__ == "__main__": 
    main() 

運行上面的代碼將輸出以下內容:

2013-01-01 09:30:00  5 
2013-01-01 19:00:00 39 
Freq: 570T, dtype: int64 

大熊貓內部決定開始從午夜分組上午8:00代替。我無法找到強制數據框在上午8:00開始分組的方式。有沒有人有使用熊貓功能的解決方案?

回答

4

使用base=480將起點移動480分鐘(8小時)。 單位爲分鐘,因爲石斑魚頻率是570T(T,在此,表示分鐘):

import datetime 
import pandas as pd 

def main(): 

    start_datetime = datetime.datetime(2013, 1, 1, 8) 
    s = pd.Series(
     range(8), 
     pd.to_datetime([ 
      '20130101 8:34:04', '20130101 10:34:08', '20130101 10:34:08', 
      '20130101 12:34:15', '20130101 1:34:28', '20130101 3:34:54', 
      '20130101 4:34:55', '20130101 5:29:12'])) 

    bar_size = datetime.timedelta(seconds=60*60*9.5) 
    time_group = pd.Grouper(freq=bar_size, closed='left', label='right', 
          base=480) 
    foobar = s.groupby(time_group).agg(sum) 
    print(foobar) 

if __name__ == "__main__": 
    main() 

產生

2013-01-01 08:00:00 22 
2013-01-01 17:30:00  6 
Freq: 570T, dtype: int64 

在內部,當pd.Grouper被賦予一個頻率,a TimeGrouper is returned

In [81]: time_group 
Out[81]: <pandas.core.resample.TimeGrouper at 0x7f1499a32198> 

所以參數pas sed到pd.Grouper實際上傳遞到pd.TimeGrouper

In [82]: pd.TimeGrouper? 
Init signature: pd.TimeGrouper(self, freq='Min', closed=None, label=None, 
           how='mean', nperiods=None, axis=0, 
           fill_method=None, limit=None, loffset=None, 
           kind=None, convention=None, base=0, **kwargs) 

TimeGrouper文檔不解釋base參數,但它具有相同的含義df.resample

In [83]: df.resample? 
Parameters 
---------- 
base : int, default 0 
    For frequencies that evenly subdivide 1 day, the "origin" of the 
    aggregated intervals. For example, for '5min' frequency, base could 
    range from 0 through 4. Defaults to 0 
+0

偉大的答案!謝謝! – itzjustricky

0

下面將讓你開始到日期向前8小時滑動:

(s.index + pd.Timedelta('9 hours 30 minutes')).strftime('%Y-%m-%d') 
# array([u'2013-01-01', u'2013-01-01', u'2013-01-01', u'2013-01-01', 
# u'2013-01-01', u'2013-01-01', u'2013-01-01', u'2013-01-01'], 
# dtype='<U10') 

然後,您可以撥打:

s.groupby((s.index + pd.Timedelta('9 hours 30 minutes')).strftime('%Y-%m-%d')).agg(sum) 
# 2013-01-01 28 

您也可以僅僅依靠對大熊貓的datetime模塊的功能,而不是單獨導入datetime

import pandas as pd 


def main(): 

    start_datetime = pd.datetime(2013, 1, 1, 8) 

    s = pd.Series(
     range(8), 
     pd.to_datetime([ 
      '20130101 8:34:04', '20130101 10:34:08', '20130101 10:34:08', 
      '20130101 12:34:15', '20130101 1:34:28', '20130101 3:34:54', 
      '20130101 4:34:55', '20130101 5:29:12'])) 

    time_group = (s.index + pd.Timedelta('9 hours 30 minutes')).strftime('%Y-%m-%d') 
    foobar = s.groupby(time_group).agg(sum) 
    print(foobar)