2012-09-25 116 views
0

我有這個帶時間戳的列表,我希望能夠根據用戶輸入在特定時間範圍內搜索所有元素(每個時間段都有另一個列表中的相應信息)(小時數< = 24或從午夜開始的日子或兩者都不)。在時間範圍內搜索

實例(這只是一個例子列表,該解決方案應該工作在非常大的列表)

list = ['2002-03-31 19:30:41', '2002-03-31 19:30:41', '2002-03-31 19:43:49', '2002-03-31  19:43:50', '2002-03-31 19:50:05', '2002-03-31 19:50:06', '2002-03-31 19:50:06', '2002-03-31 19:50:44', '2002-03-31 19:50:45', '2002-03-31 19:50:45', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:53:05', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:17', '2002-03-31 19:54:14', '2002-03-31 19:54:14'] 
activity= [a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w] 

我將使用最後一個元素list[-1]作爲參考點。如果用戶想要查看過去三小時內的活動,那麼女巫的意思是從2002-03-31 16:54:14 to 2002-03-31 19:54:14開始,時間戳的位置將用於從另一個列表中獲取活動。 我首先想到的是將每個時間戳轉換爲可用的東西,以便更容易地比較每個元素,但必須有一個更簡單的解決方案。

這個module看起來可用,但我無法弄清楚如何使用它。

問候

+0

多久將用戶搜索這些信息?時間戳列表有多大? –

+0

@MartijnPieters 並不常見,執行時間並不重要。 大約14000個時間戳。 – ogward

回答

1

你是非常幸運,您的時間戳以最簡單的順序進行排序,你可以踢在整個「轉換爲時間值」,只是做字符串比較:

times = ['2002-03-31 19:30:41', '2002-03-31 19:30:41', '2002-03-31 19:43:49', '2002-03-31  19:43:50', '2002-03-31 19:50:05', '2002-03-31 19:50:06', '2002-03-31 19:50:06', '2002-03-31 19:50:44', '2002-03-31 19:50:45', '2002-03-31 19:50:45', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:53:05', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:17', '2002-03-31 19:54:14', '2002-03-31 19:54:14'] 
activity= ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w'] 

start = '2002-03-31 16:54:14' 
end = '2002-03-31 19:54:14' 

for time, activity in zip(times, activity): 
    if time >= start and time <= end: 
     print time, activity 
+0

謝謝! 我如何將用戶輸入轉換爲13天(從午夜開始)? – ogward

+0

看看'datetime'模塊。你可以用'start =(datetime.date.today() - datetime.timedelta(days = 13)')預先計算開始日期。strftime('%Y-%m-%d 00:00:00' )',然後將'if'語句改爲'if time> = start:...'。 –

1

這樣的工作流程:

  • 使用datetime模塊到你的字符串與strptime方法轉化爲datetime對象:你datetime對象的列表。
  • 計算timedeltas通過從最後一個減去此列表的每個條目。
  • 您可以使用timedeltaseconds屬性找出一點與參考之間的秒數:與3*3600(3h)比較以確定哪些項目處於適當的時段。
0

像這樣的東西應該工作

ls = ['2002-03-31 19:30:41', '2002-03-31 19:30:41', '2002-03-31 19:43:49', '2002-03-31  19:43:50', '2002-03-31 19:50:05', '2002-03-31 19:50:06', '2002-03-31 19:50:06', '2002-03-31 19:50:44', '2002-03-31 19:50:45', '2002-03-31 19:50:45', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:53:05', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:17', '2002-03-31 19:54:14', '2002-03-31 19:54:14'] 

# target is one of the items in the list 
target = datetime.strptime('2002-03-31 19:53:17', '%Y-%m-%d %H:%M:%S') 
for l in ls: 
    print datetime.strptime(l, '%Y-%m-%d %H:%M:%S') - target 

打印

-1 day, 23:37:24 
-1 day, 23:37:24 
-1 day, 23:50:32 
-1 day, 23:50:33 
-1 day, 23:56:48 
-1 day, 23:56:49 
-1 day, 23:56:49 
-1 day, 23:57:27 
-1 day, 23:57:28 
-1 day, 23:57:28 
-1 day, 23:58:33 
-1 day, 23:58:33 
-1 day, 23:58:33 
-1 day, 23:59:08 
-1 day, 23:59:08 
-1 day, 23:59:08 
-1 day, 23:59:48 
-1 day, 23:59:49 
-1 day, 23:59:49 
-1 day, 23:59:49 
0:00:00 
0:00:57 
0:00:57 

datetime.strptime(l, '%Y-%m-%d %H:%M:%S') - target回報timedelta對象(docs)。您可以訪問timedelta對象dayssecondsmicroseconds屬性,並將這些屬性與某些所需的時間範圍進行比較。例如,爲了獲得所有發生不到一小時,從一些參考點的事件的所有指標:

less_than_an_hour = [] 
for i,l in enumerate(ls): 
    if (datetime.strptime(l, '%Y-%m-%d %H:%M:%S') - target).seconds < 3600: 
     less_than_an_hour.append(i) 
0

我倒是:

  • 時間戳列表轉換爲datetime個對象:

    times = [datetime.datetime.strptime(t, '%Y-%m-%d %H:%M:%S') for t in times] 
    
  • 使用bisect module找到用戶請求的開始時間。使用bisect是一個比使用線性搜索快很多方法,只要你的用戶輸入轉換爲datetime對象,以及:

    start = datetime.datetime(2002, 3, 31, 19, 53, 17) 
    startindex = bisect.bisect_left(times, start) 
    
  • 使用itertools功能的兩個列表合併成一個,顯示符合您範圍的條目:

    end = datetime.datetime(2002, 4, 1, 07, 53, 17) 
    
    merged = itertools.izip(times, activity) 
    fromstart = itertools.islice(merged, startindex) 
    untilend = itertools.takewhile(lambda e: e[0] <= end, fromstart) 
    

untilend的迭代現在生成startend之間條目(time, activity)元組,機智使用任何額外的內存複製列表。這讓您可以高效地處理大量數據。

演示:

>>> import itertools 
>>> import datetime 
>>> import bisect 
>>> times = ['2002-03-31 19:30:41', '2002-03-31 19:30:41', '2002-03-31 19:43:49', '2002-03-31  19:43:50', '2002-03-31 19:50:05', '2002-03-31 19:50:06', '2002-03-31 19:50:06', '2002-03-31 19:50:44', '2002-03-31 19:50:45', '2002-03-31 19:50:45', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:53:05', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:17', '2002-03-31 19:54:14', '2002-03-31 19:54:14'] 
>>> activity= ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w'] 
>>> times = [datetime.datetime.strptime(t, '%Y-%m-%d %H:%M:%S') for t in times] 
>>> start = datetime.datetime(2002, 3, 31, 19, 53, 17) 
>>> end = datetime.datetime(2002, 4, 1, 07, 53, 17) 
>>> startindex = bisect.bisect_left(times, start) 
>>> merged = itertools.izip(times, activity) 
>>> fromstart = itertools.islice(merged, startindex) 
>>> untilend = itertools.takewhile(lambda e: e[0] <= end, fromstart) 
>>> for time, activity in untilend: 
...  print time, activity 
... 
2002-03-31 19:30:41 a 
2002-03-31 19:30:41 b 
2002-03-31 19:43:49 c 
2002-03-31 19:43:50 d 
2002-03-31 19:50:05 e 
2002-03-31 19:50:06 f 
2002-03-31 19:50:06 g 
2002-03-31 19:50:44 h 
2002-03-31 19:50:45 i 
2002-03-31 19:50:45 j 
2002-03-31 19:51:50 k 
2002-03-31 19:51:50 l 
2002-03-31 19:51:50 m 
2002-03-31 19:52:25 n 
2002-03-31 19:52:25 o 
2002-03-31 19:52:25 p 
2002-03-31 19:53:05 q 
2002-03-31 19:53:06 r 
2002-03-31 19:53:06 s 
2002-03-31 19:53:06 t