2014-06-26 46 views
1

可能與:pandas dataframe group year index by decade熊貓集團dataframes到用戶指定的時間段

例如,如果我有數據如下

     status bytes_sent upstream_cache_status \ 
timestamp              
2014-05-26 23:56:30  200   356     MISS 
2014-05-26 23:56:30  200  10517      - 
2014-05-26 23:57:05  200  6923     MISS 
2014-05-26 23:57:14  200   323      - 
2014-05-26 23:57:30  200   356     MISS 
2014-05-26 23:57:38  200  8107     HIT 
2014-05-26 23:57:43  200   369     MISS 
2014-05-26 23:57:56  304   401     HIT 
2014-05-26 23:57:56  304   401     HIT 
2014-05-26 23:57:56  304   387     MISS 
2014-05-26 23:57:57  304   401     HIT 
2014-05-26 23:57:58  304   401     HIT 
2014-05-26 23:58:08  200   507    EXPIRED 
2014-05-26 23:58:29  304   338     HIT 
2014-05-26 23:58:31  400   409      - 
2014-05-26 23:58:45  200   425     MISS 

要是讓說,我想將它們分組,使得每個組包含在30日誌秒(時間是用戶指定的),我該怎麼做?我已經看到了這

df.groupby(lambda x: x.hour) 

,但我很懷疑這是在我的情況相關

回答

1

df.groupby(pd.Grouper(freq='30S', level=0))應該做的;例如

>>> aggr = lambda df: df.apply(tuple) 
>>> df.groupby(pd.Grouper(freq='30S', level=0)).aggregate(aggr) 
                 status         bytes_sent \ 
timestamp                         
2014-06-26 23:56:30        (200, 200)        (356, 10517) 
2014-06-26 23:57:00        (200, 200)        (6923, 323) 
2014-06-26 23:57:30 (200, 200, 200, 304, 304, 304, 304, 304) (356, 8107, 369, 401, 401, 387, 401, 401) 
2014-06-26 23:58:00        (200, 304)         (507, 338) 
2014-06-26 23:58:30        (400, 200)         (409, 425) 

              upstream_cache_status 
timestamp               
2014-06-26 23:56:30         (MISS, -) 
2014-06-26 23:57:00         (MISS, -) 
2014-06-26 23:57:30 (MISS, HIT, MISS, HIT, HIT, MISS, HIT, HIT) 
2014-06-26 23:58:00        (EXPIRED, HIT) 
2014-06-26 23:58:30         (-, MISS) 
+0

@。@不知道有這樣的事情,謝謝 – Jeffrey04