2015-10-24 87 views
3

我有一個MongoDB中收集近70萬條記錄與字段(其中包括)像Mongodb - 如何查找重疊間隔的記錄?

start: 13653506610, 
finish: 13653506650 

(值是Unix紀元秒,如果該事項)。對於從收集開始到收集結束的每隔30秒的時間間隔,我希望找到並聚合記錄,重疊時間間隔,包括每個時間段重疊的時間。問題是如何最好地做到這一點?

我創建的窗體的索引

db.coll.ensureIndex({start: 1, finish: 1}) 

,但即使有這樣的指數形式

db.coll.find({start: {$lt: 13653506630}, finish: {$gte: 13653506600}}) 

的查詢時間超過兩分鐘。一定有更好的方法!

+0

請問您可以包含'db.coll.find(...)。explain()'的輸出嗎? –

回答

1

這很有趣 - 感謝您的問題。

注意:這個答案只發現週期與評估間隔(問題的底部)相交的文檔。這將是聚合管道中的一個步驟,可以完成問題的頂部 - 這是一個相當大的問題。你必須有一個更完整的問題才能得到充分的回答。

我注意到您的查詢邏輯並不完全符合您的描述,所以我試圖猜測您正在嘗試做什麼並構建測試用例。

您應該可以打開mongo shell use timeSeries,然後粘貼它以驗證概念。最後幾行顯示瞭如何調試你的70,000,000文檔格式 - 索引覆蓋和執行時間。

注意:mongo-hacker使檢查這種輸出更容易。

// USE: 
// mongo timeSeries < thisFile 

// Clean out previous runs during testing 
db.timeSeries1.drop() 

// Given a start/finish 30 sec interval, find all documents that were 
// active at that time. 

// timeSeries1 holds period in epoch seconds the session was active 
// Index start and finish independently - our queries use them independently 
db.timeSeries1.ensureIndex({start:1}) 
db.timeSeries1.ensureIndex({finish:1}) 

// ASSUME: intervals do not overlap [0,29] and [30,59] 
var intervalStart = 13653506600; 
var intervalFinish = 13653506629; 

// Use cases - should find all 5 
// 1. active session matches interval exactly 
db.timeSeries1.insert({_id:1, start:intervalStart, finish:intervalFinish}) 
// 2. active session starts and ends within interval 
db.timeSeries1.insert({_id:2, start:intervalStart+5, finish:intervalFinish-5}) 
// 3. active session starts before interval and ends during interval 
db.timeSeries1.insert({_id:3, start:intervalStart-5, finish:intervalFinish-5}) 
// 4. active session starts during interval and ends after interval 
db.timeSeries1.insert({_id:4, start:intervalStart+5, finish:intervalFinish+5}) 
// 5. active session starts before interval and ends after interval 
db.timeSeries1.insert({_id:5, start:intervalStart-5, finish:intervalFinish+5}) 

// Query should return docs if: 
// the interval is within the active session 
// the active session begins or ends within the interval 
// the active session is within the interval - special 'and' case of above 
// 
var query = { 
    $or: [ 
    {start: {$gte: intervalStart, $lte: intervalFinish}}, 
    {finish: {$gte: intervalStart, $lte: intervalFinish}}, 
    {$and: [ 
     {start: {$lt: intervalStart}}, 
     {finish: {$gt: intervalFinish}} 
    ]} 
    ] 
} 

// Verify all 5 use cases found 
db.timeSeries1.find(query) 

// Verify index coverage - each stage is an IXSCAN 
db.timeSeries1.explain().find(query) 

// Verify that executionStats nReturned is not much more than 
// totalKeysExamined. 
// Examine execution times 
db.timeSeries1.explain("executionStats").find(query)