2014-02-24 68 views
0

我使用熊貓DataFrame.resample()函數將1分鐘頻率時間序列數據下采樣到15分鐘頻率。原始數據由多個時間序列組成,每個時間序列對應相同的分鐘頻率,其中每個序列是元組列表,每個元組被定義爲(<offset from start time>, <value>)。在填充DataFrame之前,我將它轉換爲(<datetime>, <value>)。下面是一個示例的時間序列需要熊貓DataFrame.resample()以紀念子時期系列開始日期時間

start = datetime(2014, 2, 24, 1, 6, 0, tzinfo=pytz.utc) 
min_ts = dict((start + timedelta(seconds=60) * t, random.randint(0,3)) for t in range(1, 30)) 

min_ts = 
{datetime.datetime(2014, 2, 24, 1, 7, tzinfo=<UTC>): 2, 
datetime.datetime(2014, 2, 24, 1, 8, tzinfo=<UTC>): 1, 
datetime.datetime(2014, 2, 24, 1, 9, tzinfo=<UTC>): 0, 
datetime.datetime(2014, 2, 24, 1, 10, tzinfo=<UTC>): 3, 
datetime.datetime(2014, 2, 24, 1, 11, tzinfo=<UTC>): 1, 
datetime.datetime(2014, 2, 24, 1, 12, tzinfo=<UTC>): 0, 
datetime.datetime(2014, 2, 24, 1, 13, tzinfo=<UTC>): 1, 
datetime.datetime(2014, 2, 24, 1, 14, tzinfo=<UTC>): 0, 
datetime.datetime(2014, 2, 24, 1, 15, tzinfo=<UTC>): 3, 
datetime.datetime(2014, 2, 24, 1, 16, tzinfo=<UTC>): 3, 
datetime.datetime(2014, 2, 24, 1, 17, tzinfo=<UTC>): 3, 
datetime.datetime(2014, 2, 24, 1, 18, tzinfo=<UTC>): 1, 
datetime.datetime(2014, 2, 24, 1, 19, tzinfo=<UTC>): 3, 
datetime.datetime(2014, 2, 24, 1, 20, tzinfo=<UTC>): 0, 
datetime.datetime(2014, 2, 24, 1, 21, tzinfo=<UTC>): 2, 
datetime.datetime(2014, 2, 24, 1, 22, tzinfo=<UTC>): 1, 
datetime.datetime(2014, 2, 24, 1, 23, tzinfo=<UTC>): 0, 
datetime.datetime(2014, 2, 24, 1, 24, tzinfo=<UTC>): 3, 
datetime.datetime(2014, 2, 24, 1, 25, tzinfo=<UTC>): 1, 
datetime.datetime(2014, 2, 24, 1, 26, tzinfo=<UTC>): 1, 
datetime.datetime(2014, 2, 24, 1, 27, tzinfo=<UTC>): 2, 
datetime.datetime(2014, 2, 24, 1, 28, tzinfo=<UTC>): 0, 
datetime.datetime(2014, 2, 24, 1, 29, tzinfo=<UTC>): 3, 
datetime.datetime(2014, 2, 24, 1, 30, tzinfo=<UTC>): 2, 
datetime.datetime(2014, 2, 24, 1, 31, tzinfo=<UTC>): 3, 
datetime.datetime(2014, 2, 24, 1, 32, tzinfo=<UTC>): 0, 
datetime.datetime(2014, 2, 24, 1, 33, tzinfo=<UTC>): 3, 
datetime.datetime(2014, 2, 24, 1, 34, tzinfo=<UTC>): 2, 
datetime.datetime(2014, 2, 24, 1, 35, tzinfo=<UTC>): 0} 

時遇到的問題是,當我加載此成數據幀,並用15分鐘的頻率運行重新採樣,在其之間的值求和,所述DateTimeIndex標籤被強制爲幀內 - 小時15分鐘(即0,15,30,45),但我想要保留原始時間序列DateTimeIndex(即從datetime.datetime(2014, 2, 24, 1, 7, tzinfo=<UTC>)開始)。我試過使用resample loffset配置參數,它影響DateTimeIndex上的首選行爲,但總和值不會相應地改變。

df = pd.DataFrame({'values': min_ts}) 
df.resample('15min', how='sum', label='right') 

df = 
DateTimeIndex     values 
-------------------------------------- 
2014-02-24 01:15:00+00:00 11 
2014-02-24 01:30:00+00:00 31 
2014-02-24 01:45:00+00:00 11 

我想要什麼結果的樣子是

df = 
DateTimeIndex     values 
-------------------------------------- 
2014-02-24 01:07:00+00:00 23 
2014-02-24 01:22:00+00:00 21 

(更新更清楚地反映所需的結果)

回答

1

嘗試使用baseloffset和/或標籤切換到left(這使用你不同的隨機種子)。

In [17]: df.resample('15min', how='sum', label='right') 
Out[17]: 
          values 
2014-02-24 01:15:00+00:00  10 
2014-02-24 01:30:00+00:00  17 
2014-02-24 01:45:00+00:00  7 

[3 rows x 1 columns] 

In [18]: df.resample('15min', how='sum', label='right',base=7) 
Out[18]: 
          values 
2014-02-24 01:22:00+00:00  16 
2014-02-24 01:37:00+00:00  18 

[2 rows x 1 columns] 

In [19]: df.resample('15min', how='sum', label='left',base=7) 
Out[19]: 
          values 
2014-02-24 01:07:00+00:00  16 
2014-02-24 01:22:00+00:00  18 

[2 rows x 1 columns] 

In [21]: df.resample('15min', how='sum', label='right',loffset='7T') 
Out[21]: 
          values 
2014-02-24 01:22:00+00:00  10 
2014-02-24 01:37:00+00:00  17 
2014-02-24 01:52:00+00:00  7 

[3 rows x 1 columns] 

In [22]: df.resample('15min', how='sum', label='left',loffset='7T') 
Out[22]: 
          values 
2014-02-24 01:07:00+00:00  10 
2014-02-24 01:22:00+00:00  17 
2014-02-24 01:37:00+00:00  7 

[3 rows x 1 columns] 
+0

'base'做我需要的。謝謝! – esroberts

+0

供參考。如果你想擴展http://pandas-docs.github.io/pandas-docs-travis/timeseries.html#up-and-downsampling以及更多的例子/解釋何時/如何使用''base''/''loffset''將不勝感激....請提交拉請求! – Jeff