集合中字段總數最多的組合

比方說，我有2個字段A和B。字段A可以取下列值：[a,b,c,d,e]和B：[x,y]。集合中字段總數最多的組合

我要尋找一個單一的MongoDB聚合管道查詢，將：

計數的次數A每個值出現在我的數據庫
顯示的B值的爲最分佈A

實施例的出現的值：

比方說，「C」恰好是最出現的值：

輸出將是：

{ '_id': { 'A': 'c', 'B': 'x' }, 'count': 43 } 
{ '_id': { 'A': 'c', 'B': 'y' }, 'count': 13 }

唯一的辦法，我沒做到這一點是通過硬編碼A:c到我"$match"聲明。

來源

2017-04-10 Yrden

您可以從輸出中推斷聚合管道。該_id場有兩個密鑰A和B其推斷的$group鍵由兩個按鍵和獲取計數是通過調用$sum蓄電池。

填充測試集

假設我們生成測試收集與下列文件

db.collection.insert([ 
    { "A": "c", "B": "x" }, 
    { "A": "c", "B": "x" }, 
    { "A": "c", "B": "x" }, 
    { "A": "e", "B": "x" }, 
    { "A": "c", "B": "x" }, 
    { "A": "c", "B": "x" }, 
    { "A": "a", "B": "x" }, 
    { "A": "c", "B": "x" }, 
    { "A": "c", "B": "x" }, 
    { "A": "a", "B": "x" }, 
    { "A": "c", "B": "y" }, 
    { "A": "c", "B": "x" }, 
    { "A": "c", "B": "x" }, 
    { "A": "b", "B": "x" }, 
    { "A": "c", "B": "x" }, 
    { "A": "c", "B": "x" }, 
    { "A": "a", "B": "y" }, 
    { "A": "a", "B": "y" }, 
    { "A": "b", "B": "y" }, 
    { "A": "b", "B": "y" }, 
    { "A": "b", "B": "y" }, 
    { "A": "b", "B": "y" }, 
    { "A": "b", "B": "y" }, 
    { "A": "c", "B": "y" }, 
    { "A": "e", "B": "y" }, 
    { "A": "e", "B": "y" }, 
    { "A": "d", "B": "y" }, 
    { "A": "d", "B": "y" }, 
    { "A": "d", "B": "y" } 
])

然後將下面的初始管道將組上這兩個鍵的文件，並得到數：

db.collection.aggregate([ 
    { 
     "$group": { 
      "_id": { "A": "$A", "B": "$B" }, 
      "count": { "$sum": 1 } 
     } 
    } 
])

樣本輸出

/* 1 */ 
{ 
    "_id" : { 
     "A" : "e", 
     "B" : "y" 
    }, 
    "count" : 2 
} 

/* 2 */ 
{ 
    "_id" : { 
     "A" : "c", 
     "B" : "x" 
    }, 
    "count" : 11 
} 

/* 3 */ 
{ 
    "_id" : { 
     "A" : "b", 
     "B" : "y" 
    }, 
    "count" : 5 
} 

/* 4 */ 
{ 
    "_id" : { 
     "A" : "b", 
     "B" : "x" 
    }, 
    "count" : 1 
} 

/* 5 */ 
{ 
    "_id" : { 
     "A" : "e", 
     "B" : "x" 
    }, 
    "count" : 1 
} 

/* 6 */ 
{ 
    "_id" : { 
     "A" : "d", 
     "B" : "y" 
    }, 
    "count" : 3 
} 

/* 7 */ 
{ 
    "_id" : { 
     "A" : "a", 
     "B" : "y" 
    }, 
    "count" : 2 
} 

/* 8 */ 
{ 
    "_id" : { 
     "A" : "a", 
     "B" : "x" 
    }, 
    "count" : 2 
} 

/* 9 */ 
{ 
    "_id" : { 
     "A" : "c", 
     "B" : "y" 
    }, 
    "count" : 2 
}

從觀察，文檔＃2計數11具有「C」作爲最出現的值：

/* 2 */ 
{ 
    "_id" : { 
     "A" : "c", 
     "B" : "x" 
    }, 
    "count" : 11 
}

已經遠遠得到了這一點，你可以進一步聚集，以獲得與鍵最重要的。您需要另一個$group流水線，它將使用A鍵對來自前一流水線的結果進行分組，創建一個列表，其中包含文檔詳細信息，即計數和相應的B值。您還需要數區域每個組A值：

db.collection.aggregate([ 
    { 
     "$group": { 
      "_id": { "A": "$A", "B": "$B" }, 
      "count": { "$sum": 1 } 
     } 
    }, 
    { 
     "$group": { 
      "_id": "$_id.A", 
      "counts": { 
       "$push": { 
        "B": "$_id.B", 
        "count": "$count" 
       } 
      }, 
      "count": { "$sum": "$count" } 
     } 
    } 
])

樣本輸出

/* 1 */ 
{ 
    "_id" : "e", 
    "counts" : [ 
     { 
      "B" : "y", 
      "count" : 2 
     }, 
     { 
      "B" : "x", 
      "count" : 1 
     } 
    ], 
    "count" : 3 
} 

/* 2 */ 
{ 
    "_id" : "c", 
    "counts" : [ 
     { 
      "B" : "x", 
      "count" : 11 
     }, 
     { 
      "B" : "y", 
      "count" : 2 
     } 
    ], 
    "count" : 13 
} 

/* 3 */ 
{ 
    "_id" : "b", 
    "counts" : [ 
     { 
      "B" : "y", 
      "count" : 5 
     }, 
     { 
      "B" : "x", 
      "count" : 1 
     } 
    ], 
    "count" : 6 
} 

/* 4 */ 
{ 
    "_id" : "d", 
    "counts" : [ 
     { 
      "B" : "y", 
      "count" : 3 
     } 
    ], 
    "count" : 3 
} 

/* 5 */ 
{ 
    "_id" : "a", 
    "counts" : [ 
     { 
      "B" : "y", 
      "count" : 2 
     }, 
     { 
      "B" : "x", 
      "count" : 2 
     } 
    ], 
    "count" : 4 
}

在這個階段，你只需要在計數字段和返回的文件排序當文檔按降序排列時的頂部文檔：

db.collection.aggregate([ 
    { 
     "$group": { 
      "_id": { "A": "$A", "B": "$B" }, 
      "count": { "$sum": 1 } 
     } 
    }, 
    { 
     "$group": { 
      "_id": "$_id.A", 
      "counts": { 
       "$push": { 
        "B": "$_id.B", 
        "count": "$count" 
       } 
      }, 
      "count": { "$sum": "$count" } 
     } 
    }, 
    { "$sort": { "count": -1 } }, 
    { "$limit": 1 } 
])

個

其收率：

{ 
    "_id" : "c", 
    "counts" : [ 
     { 
      "B" : "x", 
      "count" : 11 
     }, 
     { 
      "B" : "y", 
      "count" : 2 
     } 
    ], 
    "count": 13 
}

雖然輸出是從所期望的結構不同，但仍然足以解決問題

1.計數的次數的每A值出現在我的數據庫 - >需要的管線：

db.collection.aggregate([ 
    { 
     "$group": { 
      "_id": { "A": "$A", "B": "$B" }, 
      "count": { "$sum": 1 } 
     } 
    }, 
    { 
     "$group": { 
      "_id": "$_id.A",     
      "count": { "$sum": "$count" } 
     } 
    } 
])

2.顯示B值對的

db.collection.aggregate([ 
    { 
     "$group": { 
      "_id": { "A": "$A", "B": "$B" }, 
      "count": { "$sum": 1 } 
     } 
    }, 
    { 
     "$group": { 
      "_id": "$_id.A", 
      "counts": { 
       "$push": { 
        "B": "$_id.B", 
        "count": "$count" 
       } 
      }, 
      "count": { "$sum": "$count" } 
     } 
    }, 
    { "$sort": { "count": -1 } }, 
    { "$limit": 1 } 
])

來源

2017-04-10 14:15:06 chridam

這完美地工作最出現的值分佈，謝謝！ – Yrden

集合中字段總數最多的組合

回答

相關問題