2015-04-16 110 views
1

在python中,我如何可以減少timedelta鄰居的日期時間列表?減少日期時間列表由timedelta

如果我有

dates = [ 
     dt.datetime(1970, 1, 1, 0, 2), 
     dt.datetime(1970, 1, 1, 0, 3), 
     dt.datetime(1970, 1, 1, 0, 7), 
     dt.datetime(1970, 1, 1, 0, 8) 
    ] 

和timedelta

delta = dt.timedelta(minutes=2) 

我怎樣才能得到呢?

expected = [ 
     dt.datetime(1970, 1, 1, 0, 2, 30), 
     dt.datetime(1970, 1, 1, 0, 7, 30) 
    ] 

編輯

以數字爲例,如果我有這個號碼列表

numbers = [1,2,6,7] 
delta = 1 

我試着組近值,並獲得該組的特徵值(中間值) 。增量是值之間的最大距離。

爲數字,特徵值是

[1.5, 6.5] 

由於值在[1,2]分組並[6,7]和計算出的平均值。

+0

只是要清楚,你的目標是通過初始列表運行和當前值的時間差內消除任何條目? – wnnmaw

+0

你的意思是什麼意思timedelta neborhood? 在預期的情況下,您可以在第一個和第三個值上添加30秒。 – tgdn

+0

@tgdn一個鄰域是一組近似值 – JuanPablo

回答

0
import datetime as dt 

dates = [ 
    dt.datetime(1970, 1, 1, 0, 2), 
    dt.datetime(1970, 1, 1, 0, 3), 
    dt.datetime(1970, 1, 1, 0, 12), 
    dt.datetime(1970, 1, 1, 0, 7), 
    dt.datetime(1970, 1, 1, 0, 8), 
    dt.datetime(1970, 1, 1, 0, 9), 
    dt.datetime(1970, 1, 1, 0, 13) 
] 

def group_dates(dates, delta): 
    it = iter(dates) 
    prev = next(it) 
    grouped, total = [[prev]], delta.total_seconds() 
    for dte in it: 
     if (dte - prev).total_seconds() <= total: 
      grouped[-1].append(dte) 
     else: 
      grouped.append([dte]) 
     prev = dte 
    return grouped 
def td(l): 
    seconds = sum((d - dt.datetime(1970, 1, 1)).total_seconds() for d in l)/len(l) 
    return dt.datetime.utcfromtimestamp(seconds) 


from pprint import pprint as pp 
pp([td(sub) for sub in group_dates(dates,dt.timedelta(minutes=2))]) 

爲了避免不必要的函數調用,檢查LEN:

pp([td(sub) if len(sub) > 1 else sub[0] for sub in [datetime.datetime(1970, 1, 1, 0, 2, 30), 
datetime.datetime(1970, 1, 1, 0, 12), 
datetime.datetime(1970, 1, 1, 0, 8), 
datetime.datetime(1970, 1, 1, 0, 13)]group_dates(dates,dt.timedelta(minutes=2))]) 

或屈服值,當您去:

def group_dates(dates, delta): 
    it = iter(dates) 
    prev = next(it) 
    grouped, total = (prev,),delta.total_seconds() 
    for dte in it: 
     if (dte - prev).total_seconds() <= total: 
      grouped = grouped + (dte,) 
     else: 
      yield td(grouped) 
      grouped = (dte,) 
     prev = dte 
    yield td(grouped) 

pp(list(group_dates(dates, delta=dt.timedelta(minutes=2)))) 
[datetime.datetime(1970, 1, 1, 0, 2, 30), 
datetime.datetime(1970, 1, 1, 0, 12), 
datetime.datetime(1970, 1, 1, 0, 8), 
datetime.datetime(1970, 1, 1, 0, 13)] 

一些計時:

In [28]: dates = [               
    dt.datetime(1970, 1, 1, 0, 2), 
    dt.datetime(1970, 1, 1, 0, 3), 
    dt.datetime(1970, 1, 1, 0, 4), 
    dt.datetime(1970, 1, 1, 0, 7), 
    dt.datetime(1970, 1, 1, 0, 8), 
    dt.datetime(1970, 1, 1, 0, 9), 
    dt.datetime(1970, 1, 1, 0, 15), 
    dt.datetime(1970, 1, 1, 0, 22), 
    dt.datetime(1970, 1, 1, 0, 24), 
    dt.datetime(1970, 1, 1, 0, 27) 
] 

In [41]: for i in range(10000):  
      dates.append(dates[-1]+dt.timedelta(minutes=choice([1,2,3,4]))) 
    ....:  
In [42]: timeit [td(sub) if len(sub) > 1 else sub[0] for sub in group_dates(dates,dt.timedelta(minutes=2))] 
100 loops, best of 3: 15.8 ms per loop 

In [43]: timeit reduce_datetime_list_by_delta(dates, delta)       
100 loops, best of 3: 16.9 ms per loop 

In [44]: timeit timestamps = map(avgtm, groupby(dates, key=grouper(delta))) 
10 loops, best of 3: 18.8 ms per loop 

In [45]: timeit (list(group_dates_iter(dates, delta = dt.timedelta(minutes=2)))) 
10 loops, best of 3: 18.4 ms per loop 
+0

http://ideone.com/aqqiKY – JuanPablo

+0

@JuanPablo,那是對的嗎? –

+0

是的......但是發生了什麼'dt.datetime(1970,1,1,0,12)'值?這個值應該在一個單獨的組中 – JuanPablo

0
import datetime as dt 

def datetime_to_epoch(dtime): 
    return (dtime - dt.datetime(1970,1,1)).total_seconds() 

def datetime_sublists(datetime_list, time_delta = dt.timedelta(days=1)): 
    sublists = [] 

    temp = [datetime_list[0]] 
    for i in range(len(datetime_list)-1): 
     prev_date = datetime_list[i] 
     current_date = datetime_list[i+1] 

     if current_date - prev_date <= time_delta: 
      temp.append(current_date) 
     else: 
      sublists.append(temp) 
      temp = [current_date] 
    sublists.append(temp) 

    return sublists 

def reduce_datetime_list_by_delta(date_list, delta): 
    sublist = datetime_sublists(date_list, delta) 

    reduced = [] 
    for dates in sublist: 
     epochs = [ datetime_to_epoch(date) for date in dates] 
     epoch_average = sum(epochs)/len(epochs) 
     reduced.append(dt.datetime.utcfromtimestamp(epoch_average)) 

    return reduced 


dates = [ 
    dt.datetime(1970, 1, 1, 0, 2), 
    dt.datetime(1970, 1, 1, 0, 3), 
    dt.datetime(1970, 1, 1, 0, 7), 
    dt.datetime(1970, 1, 1, 0, 8), 
    dt.datetime(1970, 1, 1, 0, 12) 
] 

delta = dt.timedelta(minutes=2) 

print reduce_datetime_list_by_delta(dates, delta) 
+0

http://ideone.com/Pd6Gdn – JuanPablo

2

問題的描述已經給出了它扔掉:你想使用的groupby()功能從itertools

所有這一切需要的是一個稍微聰明key功能,一個是記得的最後一個狀態,並不斷給予同樣的key值,只要因爲連續的時間戳比delta更接近。

分組後,將找到的羣組轉換爲平均次數,照顧單個時間戳(包含示例)。

import datetime as dt 
from itertools import groupby 

dates = [ 
     dt.datetime(1970, 1, 1, 0, 2), 
     dt.datetime(1970, 1, 1, 0, 3), 
     dt.datetime(1970, 1, 1, 0, 7), 
     dt.datetime(1970, 1, 1, 0, 8), 
     dt.datetime(1970, 1, 1, 0, 13) 
    ] 
delta = dt.timedelta(minutes=2) 

class grouper: 
    def __init__(self, delta): 
     self.delta= delta 
     self.last = None 

    def __call__(self, tm): 
     # we keep on returning the same key as long as successive time 
     # stamps are within the last time stamp + delta 
     self.last = tm if (self.last is None) or (tm - self.last)>self.delta \ 
         else self.last 
     return self.last 

# transform the result of groupby into average times 
def avgtm(item): 
    (key, tms) = item 
    tms = list(tms) # transform generator into list so we can index it 
    return tms[0] + (tms[-1]-tms[0])/2 if len(tms)>1 else tms[0] 

timestamps = map(avgtm, groupby(dates, key=grouper(delta))) 
print "Time stamps: ",timestamps 

息率輸出:

Time stamps: [datetime.datetime(1970, 1, 1, 0, 2, 30), 
       datetime.datetime(1970, 1, 1, 0, 7, 30), 
       datetime.datetime(1970, 1, 1, 0, 13)] 
+1

在適當的比較器中使用'itertools.groupby'是我首先想到的,太。如果你使用'timestamps =(avgtm(list(tms))中的生成器表達式來代替'self.last is None',那麼你可以簡單地說'not self.last',並且'avgtm'coupld會被簡化一下(_,tms)groupby(日期,鍵=石斑魚(增量)))''而不是'map.' –

+0

優秀的建議,thx! – haavee