1

合計數情況下我有一組的(〜35K)的文件看起來是這樣的:蒙戈的價值

{ 
    "_id" : ObjectId("583dabfc7572394f93ac6ef2"), 
    "updatedAt" : ISODate("2016-11-29T16:25:32.130Z"), 
    "createdAt" : ISODate("2016-11-29T16:25:32.130Z"), 
    "sourceType" : "report", 
    "sourceRef" : ObjectId("583da865686e3dfbd977f059"), 
    "type" : "video", 
    "caption" : "lorem ipsum", 
    "timestamps" : { 
     "postedAt" : ISODate("2016-08-26T15:09:35.000Z"), 
     "monthOfYear" : 7, // 0-based 
     "dayOfWeek" : 5, // 0-based 
     "hourOfDay" : 16 // 0-based 
    }, 
    "stats" : { 
     "comments" : 0, 
     "likes" : 8 
    }, 
    "user" : { 
     "id" : "123456", 
     "username" : "johndoe", 
     "fullname" : "John", 
     "picture" : "" 
    }, 
    "images" : { 
     "thumbnail" : "", 
     "low" : "", 
     "standard" : "" 
    }, 
    "mentions" : [ 
     "janedoe" 
    ], 
    "tags" : [ 
     "holiday", 
     "party" 
    ], 
    "__v" : 0 
} 

我想產生一個彙總報告,其中將用於小時圖的文檔頻率一年中的一天/一週/一月的日期,以及提及/標籤的次數。

{ 
    // Each frequency is independant from the others, 
    // e.g. the total count for each frequency should 
    // be ~35k. 
    dayFrequency: [ 
    { day: 0, count: 1400 }, // Monday 
    { day: 1, count: 1700 }, // Tuesday 
    { day: 2, count: 1800 }, // Wednesday 
    { /* etc */ }, 
    { day: 6, count: 1200 } // Sunday 
    ], 

    monthFrequency: [ 
    { month: 0, count: 200 }, // January 
    { month: 1, count: 250 }, // February 
    { month: 2, count: 300 }, // March 
    { /* etc */ }, 
    { month: 11, count: 150 } // December 
    ], 

    hourFrequency: [ 
    { hour: 0, count: 150 }, // 0am 
    { hour: 1, count: 200 }, // 1am 
    { hour: 2, count: 275 }, // 2am 
    { /* etc */ }, 
    { hour: 23, count: 150 }, // 11pm 
    ], 

    mentions: { 
    janedoe: 12, 
    johnsmith: 11, 
    peter: 54, 
    /* and so on */ 
    }, 

    tags: { 
    holiday: 872, 
    party: 1029, 
    /* and so on */ 
    } 
} 

這是可能的,如果是這樣,我會如何寫它?據我瞭解,由於我正在執行所有匹配文件的彙總,它實際上會是一個組?

我的代碼到目前爲止只是將所有匹配的記錄分組到一個組中,但我不確定如何前進。

Model.aggregate([ 
    { $match: { sourceType: 'report', sourceRef: '583da865686e3dfbd977f059' } }, 
    { $group: { 
    _id: '$sourceRef' 
    }} 
], (err, res) => { 
    console.log(err); 
    console.log(res); 
}) 

也可以接受將被計數頻率爲計數(例如[ 1400, 1700, 1800, /* etc */ 1200 ]),這導致我看$count和一些其他運營商的陣列,但是我又不敢使用情況不清楚。

回答

1

目前還不可能(在撰寫本文時)在單個管道中使用MongoDB 3.2進行此操作。但是,從MongoDB 3.4開始,您可以使用運算符,它允許在同一組輸入文檔的單個階段內處理多個聚合流水線。每個子管道在輸出文檔中都有自己的字段,其結果以文檔數組形式存儲。

Model.aggregate([ 
    { "$match": { "sourceType": "report", "sourceRef": "583da865686e3dfbd977f059" } }, 
    { 
     "$facet": { 
      "dayFrequency": [ 
       { 
        "$group": { 
         "_id": "$timestamps.dayOfWeek", 
         "count": { "$sum": 1 } 
        } 
       } 
      ], 
      "monthFrequency": [ 
       { 
        "$group": { 
         "_id": "$timestamps.monthOfYear", 
         "count": { "$sum": 1 } 
        } 
       } 
      ], 
      "hourFrequency": [ 
       { 
        "$group": { 
         "_id": "$timestamps.hourOfDay", 
         "count": { "$sum": 1 } 
        } 
       } 
      ], 
      "mentions": [ 
       { "$unwind": "$mentions" }, 
       { 
        "$group": { 
         "_id": "$mentions", 
         "count": { "$sum": 1 } 
        } 
       } 
      ], 
      "tags": [ 
       { "$unwind": "$tags" }, 
       { 
        "$group": { 
         "_id": "$tags", 
         "count": { "$sum": 1 } 
        } 
       } 
      ] 
     } 
    } 
], (err, res) => { 
    console.log(err); 
    console.log(res); 
}) 

例如,上面可以通過運行下面的聚合管線實現