2014-06-27 85 views
0

我有一個形狀爲[t,z,x,y]的numpy數組,表示每小時時間序列的三維數據。數組的軸是時間,垂直座標,水平座標1,水平座標2.還有一個每小時datetime.datetime時間戳的t元素列表。numpy:按組的聚合4D陣列

我想計算每天每日中午的意思。這將是[nday,Z,X,Y]數組。

我試圖找到一種pythonic方式來做到這一點。我寫了一些有用的for循環,但看起來很慢,不靈活和冗長。

在我看來,熊貓不是我的解決方案,因爲我的時間序列數據是三維的。我很樂意被證明是錯誤的。

我想出了這個,使用itertools,找到中日時間戳和按日期分組,他們現在我來試圖申請imap找到手段。

import numpy as np 
import pandas as pd 
import itertools 

# create 72 hours of pseudo-data with 3 vertical levels and a 4 by 4 
# horizontal grid. 
data = np.zeros((72, 3, 4, 4)) 
t = pd.date_range(datetime(2008,7,1), freq='1H', periods=72) 
for i in range(data.shape[0]): 
    data[i,...] = i 

# find the timestamps that are "midday" in North America. We'll 
# define midday as between 15:00 and 23:00 UTC, which is 10:00 EST to 
# 15:00 PST. 
def is_midday(this_t): 
    return ((this_t.hour >= 15) and (this_t.hour <= 23)) 

# group the midday timestamps by date 
for dt, grp in itertools.groupby(itertools.ifilter(is_midday, t), 
           key=lambda x: x.date()): 
    print 'date ' + str(dt) 
    for g in grp: 
     print g 

# find means of mid-day data by date 
data_list = np.split(data, data.shape[0]) 
grps = itertools.groupby(itertools.ifilter(is_midday, t), 
         key=lambda x: x.date()) 
# how to apply itertools.imap (or something else) to data_list and 
# grps? Or somehow split data along axis 0 according to grps? 

回答

0

你可以將幾乎任何物體都推入熊貓結構中。通常不推薦,但在這種情況下,它可能適合你。

創建按時間索引的系列,其中每個元件的3-d numpy的陣列

In [117]: s = Series([data[i] for i in range(data.shape[0])],index=t) 

In [118]: s 
Out[118]: 
2008-07-01 00:00:00 [[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], ... 
2008-07-01 01:00:00 [[[1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0], ... 
2008-07-01 02:00:00 [[[2.0, 2.0, 2.0, 2.0], [2.0, 2.0, 2.0, 2.0], ... 
2008-07-01 03:00:00 [[[3.0, 3.0, 3.0, 3.0], [3.0, 3.0, 3.0, 3.0], ... 
2008-07-01 04:00:00 [[[4.0, 4.0, 4.0, 4.0], [4.0, 4.0, 4.0, 4.0], ... 
2008-07-01 05:00:00 [[[5.0, 5.0, 5.0, 5.0], [5.0, 5.0, 5.0, 5.0], ... 
2008-07-01 06:00:00 [[[6.0, 6.0, 6.0, 6.0], [6.0, 6.0, 6.0, 6.0], ... 
2008-07-01 07:00:00 [[[7.0, 7.0, 7.0, 7.0], [7.0, 7.0, 7.0, 7.0], ... 
2008-07-01 08:00:00 [[[8.0, 8.0, 8.0, 8.0], [8.0, 8.0, 8.0, 8.0], ... 
2008-07-01 09:00:00 [[[9.0, 9.0, 9.0, 9.0], [9.0, 9.0, 9.0, 9.0], ... 
2008-07-01 10:00:00 [[[10.0, 10.0, 10.0, 10.0], [10.0, 10.0, 10.0,... 
2008-07-01 11:00:00 [[[11.0, 11.0, 11.0, 11.0], [11.0, 11.0, 11.0,... 
2008-07-01 12:00:00 [[[12.0, 12.0, 12.0, 12.0], [12.0, 12.0, 12.0,... 
2008-07-01 13:00:00 [[[13.0, 13.0, 13.0, 13.0], [13.0, 13.0, 13.0,... 
2008-07-01 14:00:00 [[[14.0, 14.0, 14.0, 14.0], [14.0, 14.0, 14.0,... 
... 
2008-07-03 09:00:00 [[[57.0, 57.0, 57.0, 57.0], [57.0, 57.0, 57.0,... 
2008-07-03 10:00:00 [[[58.0, 58.0, 58.0, 58.0], [58.0, 58.0, 58.0,... 
2008-07-03 11:00:00 [[[59.0, 59.0, 59.0, 59.0], [59.0, 59.0, 59.0,... 
2008-07-03 12:00:00 [[[60.0, 60.0, 60.0, 60.0], [60.0, 60.0, 60.0,... 
2008-07-03 13:00:00 [[[61.0, 61.0, 61.0, 61.0], [61.0, 61.0, 61.0,... 
2008-07-03 14:00:00 [[[62.0, 62.0, 62.0, 62.0], [62.0, 62.0, 62.0,... 
2008-07-03 15:00:00 [[[63.0, 63.0, 63.0, 63.0], [63.0, 63.0, 63.0,... 
2008-07-03 16:00:00 [[[64.0, 64.0, 64.0, 64.0], [64.0, 64.0, 64.0,... 
2008-07-03 17:00:00 [[[65.0, 65.0, 65.0, 65.0], [65.0, 65.0, 65.0,... 
2008-07-03 18:00:00 [[[66.0, 66.0, 66.0, 66.0], [66.0, 66.0, 66.0,... 
2008-07-03 19:00:00 [[[67.0, 67.0, 67.0, 67.0], [67.0, 67.0, 67.0,... 
2008-07-03 20:00:00 [[[68.0, 68.0, 68.0, 68.0], [68.0, 68.0, 68.0,... 
2008-07-03 21:00:00 [[[69.0, 69.0, 69.0, 69.0], [69.0, 69.0, 69.0,... 
2008-07-03 22:00:00 [[[70.0, 70.0, 70.0, 70.0], [70.0, 70.0, 70.0,... 
2008-07-03 23:00:00 [[[71.0, 71.0, 71.0, 71.0], [71.0, 71.0, 71.0,... 
Freq: H, Length: 72 

定義您聚合函數。您需要訪問返回內部對象的值; concatenating脅迫回實際numpy的數組,然後彙總(在這種情況下平均值)

In [119]: def f(g,grp): 
    .....:  return np.concatenate(grp.values).mean() 
    .....: 

由於不知道你的最終輸出應該是什麼樣子,只是手動創建一個基於時間的石斑魚(這基本上是一個重採樣) ,但並沒有對最後的結果什麼(它只是一個聚合值的列表)

In [121]: [ f(g,grp) for g, grp in s.groupby(pd.Grouper(freq='D')) ] 
Out[121]: [11.5, 35.5, 59.5] 

您可以在這裏得到合理的花哨,說返回大熊貓對象(和潛在concat它們)。