2011-08-03 336 views
2

我有日期的列表,例如:Python:如何計算日期列表中的日期範圍?

['2011-02-27', '2011-02-28', '2011-03-01', '2011-04-12', '2011-04-13', '2011-06-08'] 

如何找到包含這些日期內連續日期範圍?在上面的示例中,範圍應爲:

[{"start_date": '2011-02-27', "end_date": '2011-03-01'}, 
{"start_date": '2011-04-12', "end_date": '2011-04-13'}, 
{"start_date": '2011-06-08', "end_date": '2011-06-08'} 
] 

謝謝。

+0

我甚至不確定你的例子中你的派生解決方案。 '2011-02-28'約會去了哪裏? – user37078

+0

'2011-02-28'在範圍內{「start_date」:'2011-02-27','end_date':'2011-03-01'} – Continuation

+0

好的,所以你的第二個代碼塊,dicts列表你有,不是*答案*,而只是第二個參數?如果是這樣,你可以發佈結果,因爲你期望它被返回? – user37078

回答

7

這個作品,,但我不滿意它,將工作在一個更清潔的解決方案編輯答案。做完後,這裏是一個乾淨,工作液:

import datetime 
import pprint 

def parse(date): 
    return datetime.date(*[int(i) for i in d.split('-')]) 

def get_ranges(dates): 
    while dates: 
     end = 1 
     try: 
      while dates[end] - dates[end - 1] == datetime.timedelta(days=1): 
       end += 1 
     except IndexError: 
      pass 

     yield { 
      'start-date': dates[0], 
      'end-date': dates[end-1] 
     } 
     dates = dates[end:] 

dates = [ 
    '2011-02-27', '2011-02-28', '2011-03-01', 
    '2011-04-12', '2011-04-13', 
    '2011-06-08' 
] 

# Parse each date and convert it to a date object. Also ensure the dates 
# are sorted, you can remove 'sorted' if you don't need it 
dates = sorted([parse(d) for d in dates]) 

pprint.pprint(list(get_ranges(dates))) 

,相對輸出:

[{'end-date': datetime.date(2011, 3, 1), 
    'start-date': datetime.date(2011, 2, 27)}, 
{'end-date': datetime.date(2011, 4, 13), 
    'start-date': datetime.date(2011, 4, 12)}, 
{'end-date': datetime.date(2011, 6, 8), 
    'start-date': datetime.date(2011, 6, 8)}] 
0

試圖忍者GaretJax的編輯:;)在

def date_to_number(date): 
    return datetime.date(*[int(i) for i in date.split('-')]).toordinal() 

def number_to_date(number): 
    return datetime.date.fromordinal(number).strftime('%Y-%m-%d') 

def day_ranges(dates): 
    day_numbers = set(date_to_number(d) for d in dates) 
    start = None 
    # We loop including one element guaranteed not to be in the set, to force the 
    # closing of any range that's currently open. 
    for n in xrange(min(day_numbers), max(day_numbers) + 2): 
    if start == None: 
     if n in day_numbers: start = n 
    else: 
     if n not in day_numbers: 
     yield { 
      'start_date': number_to_date(start), 
      'end_date': number_to_date(n - 1) 
     } 
     start = None 

list(
    day_ranges([ 
    '2011-02-27', '2011-02-28', '2011-03-01', 
    '2011-04-12', '2011-04-13', '2011-06-08' 
    ]) 
) 
+1

您是否意識到您的解決方案會進行大量無用的迭代? 103,在這個例子中,我的數據集是4 ... ;-) – GaretJax

+0

哦,和BTW,這個數據集上的扼流圈:'['2011-02-27','2011-02-28','2011- 03-01','2011-04-12','2011-04-13','2011-06-08','2011-06-10']'...-) – GaretJax

+0

是的,我得到了確實錯誤的算法,特別是對於稀疏日期集合。 :)雖然,用新數據集對我來說工作得很好。 –

0
from datetime import datetime, timedelta 

dates = ['2011-02-27', '2011-02-28', '2011-03-01', '2011-04-12', '2011-04-13', '2011-06-08'] 
d = [datetime.strptime(date, '%Y-%m-%d') for date in dates] 
test = lambda x: x[1] - x[0] != timedelta(1) 
slices = [0] + [i+1 for i, x in enumerate(zip(d, d[1:])) if test(x)] + [len(dates)] 
ranges = [{"start_date": dates[s], "end_date": dates[e-1]} for s, e in zip(slices, slices[1:])] 

結果以下:

>>> pprint.pprint(ranges) 
[{'end_date': '2011-03-01', 'start_date': '2011-02-27'}, 
{'end_date': '2011-04-13', 'start_date': '2011-04-12'}, 
{'end_date': '2011-06-08', 'start_date': '2011-06-08'}] 

slices列表理解獲取所有指數,其中上一個日期不是當前日期前一天。將0添加到前面,並將len(dates)添加到最後,並且每個日期範圍可以描述爲dates[slices[i]:slices[i+1]-1]

0

我的主題是輕微的變化(我最初建的開始/結束列表,並拉上他們返回記錄,但我更喜歡@Karl Knechtel的產生辦法):

from datetime import date, timedelta 

ONE_DAY = timedelta(days=1) 

def find_date_windows(dates): 
    # guard against getting empty list 
    if not dates: 
     return 

    # convert strings to sorted list of datetime.dates 
    dates = sorted(date(*map(int,d.split('-'))) for d in dates) 

    # build list of window starts and matching ends 
    lastStart = lastEnd = dates[0] 
    for d in dates[1:]: 
     if d-lastEnd > ONE_DAY: 
      yield {'start_date':lastStart, 'end_date':lastEnd} 
      lastStart = d 
     lastEnd = d 
    yield {'start_date':lastStart, 'end_date':lastEnd} 

下面是測試情況:

tests = [ 
    ['2011-02-27', '2011-02-28', '2011-03-01', '2011-04-12', '2011-04-13', '2011-06-08'], 
    ['2011-06-08'], 
    [], 
    ['2011-02-27', '2011-02-28', '2011-03-01', '2011-04-12', '2011-04-13', '2011-06-08', '2011-06-10'], 
] 
for dates in tests: 
    print dates 
    for window in find_date_windows(dates): 
     print window 
    print 

打印:

['2011-02-27', '2011-02-28', '2011-03-01', '2011-04-12', '2011-04-13', '2011-06-08'] 
{'start_date': datetime.date(2011, 2, 27), 'end_date': datetime.date(2011, 3, 1)} 
{'start_date': datetime.date(2011, 4, 12), 'end_date': datetime.date(2011, 4, 13)} 
{'start_date': datetime.date(2011, 6, 8), 'end_date': datetime.date(2011, 6, 8)} 

['2011-06-08'] 
{'start_date': datetime.date(2011, 6, 8), 'end_date': datetime.date(2011, 6, 8)} 

[] 

['2011-02-27', '2011-02-28', '2011-03-01', '2011-04-12', '2011-04-13', '2011-06-08', '2011-06-10'] 
{'start_date': datetime.date(2011, 2, 27), 'end_date': datetime.date(2011, 3, 1)} 
{'start_date': datetime.date(2011, 4, 12), 'end_date': datetime.date(2011, 4, 13)} 
{'start_date': datetime.date(2011, 6, 8), 'end_date': datetime.date(2011, 6, 8)} 
{'start_date': datetime.date(2011, 6, 10), 'end_date': datetime.date(2011, 6, 10)} 
0

下面是一個替代的解決方案:它RET甕恩列表(開始,完成)的元組,因爲這是我所需要的;)。

這使列表發生變化,所以我需要複製一份。顯然,這會增加內存使用量。我懷疑list.pop()不是超高效的,但這可能取決於python中list的實現。

def collapse_dates(date_list): 
    if not date_list: 
     return date_list 
    result = [] 
    # We are going to alter the list, so create a (sorted) copy. 
    date_list = sorted(date_list) 
    while len(date_list): 
     # Grab the first item: this is both the start and end of the range. 
     start = current = date_list.pop(0) 
     # While the first item in the list is the next day, pop that and 
     # set it to the end of the range. 
     while len(date_list) and date_list[0] == current + datetime.timedelta(1): 
      current = date_list.pop(0) 
     # That's a completed range. 
     result.append((start,current)) 

    return result 

你可以很容易地改變附加行來追加一個字典,或者yield而不是附加到列表。

哦,我的假設他們已經是日期。