2017-10-05 61 views
2

我試圖使用mongodb聚合查詢來連接($ lookup)兩個集合,然後區分連接數組中的所有唯一值。需要對使用mongodb聚合查詢從另一個集合加入的多個字段進行區分計數

所以我的兩個集合是這樣的: 事件 -

{ 
    "_id" : "1", 
    "name" : "event1", 
    "objectsIds" : [ "1", "2", "3" ], 
} 

對象

{ 
    "_id" : "1", 
    "name" : "object1", 
    "metaDataMap" : { 
         "SOURCE" : ["ABC", "DEF"], 
         "DESTINATION" : ["XYZ", "PDQ"], 
         "TYPE" : [] 
        } 
}, 
{ 
    "_id" : "2", 
    "name" : "object2", 
    "metaDataMap" : { 
         "SOURCE" : ["RST", "LNE"], 
         "TYPE" : ["text"] 
        } 
}, 
{ 
    "_id" : "3", 
    "name" : "object3", 
    "metaDataMap" : { 
         "SOURCE" : ["NOP"], 
         "DESTINATION" : ["PHI", "NYC"], 
         "TYPE" : ["video"] 
        } 
} 

我想出來的,當我做事件$匹配_id = 1我想要的加入metaDataMap,然後清除所有這樣的鍵: 計數事件_id = 1

SOURCE : 5 
DESTINATION: 4 
TYPE: 2 

我到目前爲止是這樣的:

db.events.aggregate([ 
{$match: {"_id" : id}} 
,{$lookup: {"from" : "objects", 
      "localField" : "objectsIds", 
      "foreignField" : "_id", 
      "as" : "objectResults"}} 
,{$project: {x: {$objectToArray: "$objectResults.metaDataMap"}}} 
,{$unwind: "$x"} 
,{$match: {"x.k": {$ne: "_id"}}} 
,{$group: {_id: "$x.k", y: {$addToSet: "$x.v"}}} 
,{$addFields: {size: {"$size":"$y"}} } 
]); 

失敗的原因是$ objectResults.metaDataMap不是一個對象它是一個數組。有關如何解決這個問題或不同方式來做我想做的事情的任何建議? 另外我不一定知道metaDataMap數組中的字段(鍵)。而且我不想統計或包含Map中可能存在或可能不存在的字段。

+1

爲什麼不在項目之前展開onjectResults? – barbakini

+0

工作正常!謝謝! – Deckard

+0

樂意幫忙.. – barbakini

回答

1

這應該可以做到。我在你的輸入設備上進行了測試,並故意添加了一些類似於NYC的重複值,這些重複值顯示在多個DESTINATION之間,以確保它被清除(即按照要求提供不同的計數)。 爲了好玩,將所有階段註釋掉,然後自上而下注銷它以查看每個階段的效果。

var id = "1"; 

c=db.foo.aggregate([ 
// Find a thing: 
{$match: {"_id" : id}} 

// Do the lookup into the objects collection: 
,{$lookup: {"from" : "foo2", 
      "localField" : "objectsIds", 
      "foreignField" : "_id", 
      "as" : "objectResults"}} 

// OK, so we've got a bunch of extra material now. Let's 
// get down to just the metaDataMap: 
,{$project: {x: "$objectResults.metaDataMap"}} 
,{$unwind: "$x"} 
,{$project: {"_id":0}} 

// Use $objectToArray to get all the field names dynamically: 
// Replace the old x with new x (don't need the old one): 
,{$project: {x: {$objectToArray: "$x"}}} 
,{$unwind: "$x"} 

// Collect unique field names. Interesting note: the values 
// here are ARRAYS, not scalars, so $push is creating an 
// array of arrays: 
,{$group: {_id: "$x.k", tmp: {$push: "$x.v"}}} 

// Almost there! We have to turn the array of array (of string) 
// into a single array which we'll subsequently dedupe. We will 
// overwrite the old tmp with a new one, too: 
,{$addFields: {tmp: {$reduce:{ 
    input: "$tmp", 
    initialValue:[], 
    in:{$concatArrays: [ "$$value", "$$this"]} 
     }} 
    }} 

// Now just unwind and regroup using the addToSet operator 
// to dedupe the list: 
,{$unwind: "$tmp"} 
,{$group: {_id: "$_id", uniqueVals: {$addToSet: "$tmp"}}} 

// Add size for good measure: 
,{$addFields: {size: {"$size":"$uniqueVals"}} } 
      ]); 
+0

當我在大型數據集上嘗試時,出現「超出最大文檔大小」的錯誤。什麼是最佳/最快的方式來解決最大文檔大小? – Deckard

+0

?這是否與{$ match:{「_id」:id}}一致?你有多少物體在向上看?這裏有一個X * Y的東西在將文件擴展到每個文件16MB。 –

+0

是的,我有在那裏的比賽ID。它改變了我查找的數量,但是它在查找步驟中的25,000個對象之後失敗。另外我在3.4的文檔中看到,只有返回結果而不是管道過程中,管道受16Mb大小的限制?那麼,如果我的結果沒有超出限制,爲什麼我會得到一個錯誤? – Deckard

0

我能夠使用以下查詢生成所需的結果。

db.events.aggregate(
    [ 
     {$match: {"_id" : id}} , 
     {$lookup: { 
      "from" : "objects", 
      "localField" : "objectsIds", 
      "foreignField" : "_id", 
      "as" : "objectResults" 
     }}, 
     {$unwind: "$objectResults"}, 
     {$project:{"A":"$objectResults.metaDataMap"}}, 
     {$unwind: {path: "$A.SOURCE", preserveNullAndEmptyArrays: true}}, 
     {$unwind:{ path: "$A.DESTINATION", preserveNullAndEmptyArrays: true}}, 
     {$unwind:{ path: "$A.TYPE", preserveNullAndEmptyArrays: true}}, 
     {$group:{"_id":"$_id","SOURCE":{$addToSet:"$A.SOURCE"},"DESTINATION":{$addToSet:"$A.DESTINATION"},"TYPE":{$addToSet:"$A.TYPE"}}}, 
     {$addFields: {"SOURCE":{$size:"$SOURCE"},"DESTINATION":{$size:"$DESTINATION"},"TYPE":{$size:"$TYPE"}}}, 
     {$project:{"_id":0}}] 
).pretty() 

更新了動態字段的查詢。

db.events.aggregate([ 
{ 
$match: {"_id" : id}} , 
{$lookup: {"from" : "objects","localField" : "objectsIds","foreignField" : "_id","as" : "objectResults"}}, 
{$unwind: "$objectResults"}, 
{$project:{"A":"$objectResults.metaDataMap"}}, 
{$project: {x: {$objectToArray: "$A"}}}, 
{$unwind: "$x"}, 
{$match: {"x.k": {$ne: "_id"}}}, 
{$unwind:"$x.v"}, 
{$group: {_id: "$x.k", y: {$addToSet: "$x.v"}}}, 
{$project:{"size":{$size:"$y"}}}] 
).pretty() 
+0

我想我應該澄清一點,我不一定知道metaDataMap數組中的字段(鍵)。而且我不想統計或包含Map中可能存在或可能不存在的字段。 – Deckard

相關問題