MongoDB的聚合：如何返回數組的唯一匹配的元素

在我的MongoDB藏書我有結構如下文件：MongoDB的聚合：如何返回數組的唯一匹配的元素

/* 0 */ 
{ 
    "_id" : ObjectId("50485b89b30f1ea69110ff4c"), 

    "publisher" : { 
    "$ref" : "boohya", 
    "$id" : "foo" 
    }, 
    "displayName" : "Paris Nightlife", 
    "catalogDescription" : "Some desc goes here", 
    "languageCode" : "en", 
    "rating" : 0, 
    "status" : "LIVE", 
    "thumbnailId" : ObjectId("50485b89b30f1ea69110ff4b"), 
    "indexTokens" : ["Nightlife", "Paris"] 
}

我執行下面的正則表達式查詢發現具有一個一個IndexToken開始與「所有文件帕」：

{ "indexTokens" : { "$regex" : "^Par" , "$options" : "i"}}

如果我只選擇indexTokens場要返回這樣的：

{ "indexTokens" : 1}

產生的DBOBJECT是

{ "_id" : { "$oid" : "50485b89b30f1ea69110ff4c"} , "indexTokens" : [ "Nightlife" , "Paris"]}

我想獲得只有令牌/標籤相匹配的正則表達式（我don0t關心在這一點上檢索文檔，我也不需要的所有標籤匹配的文檔）

這是在MongoDB v2.2下提供的新聚合框架的情況。？

如果是的話我怎麼修改我的查詢，使實際結果會是什麼樣子：

{「indexTokens」：「巴黎」，「天堂河」，「芭瑪」，等...] }

獎金問題（你有codez）：我如何使用Java驅動程序？

現在我的Java看起來像：

DBObject query = new BasicDBObject("indexTokens", java.util.regex.Pattern.compile("^"+filter+"", Pattern.CASE_INSENSITIVE)); 
    BasicDBObject fields = new BasicDBObject("indexTokens",1); 
    DBCursor curs = getCollection() 
        .find(query, fields) 
        .sort(new BasicDBObject("indexTokens" , 1)) 
        .limit(maxSuggestionCount);

THX :)

編輯：

按你的答案我修改了JAVA代碼如下：

BasicDBObject cmdBody = new BasicDBObject("aggregate", "Book"); 
    ArrayList<BasicDBObject> pipeline = new ArrayList<BasicDBObject>(); 

    BasicDBObject match = new BasicDBObject("$match", new BasicDBObject("indexTokens", java.util.regex.Pattern.compile("^"+titleFilter+"", Pattern.CASE_INSENSITIVE))); 
    BasicDBObject unwind = new BasicDBObject("$unwind", "$indexTokens"); 
    BasicDBObject match2 = new BasicDBObject("$match", new BasicDBObject("indexTokens", java.util.regex.Pattern.compile("^"+titleFilter+"", Pattern.CASE_INSENSITIVE))); 
    BasicDBObject groupFilters = new BasicDBObject("_id",null); 
    groupFilters.append("indexTokens", new BasicDBObject("$push", "$indexTokens")); 
    BasicDBObject group = new BasicDBObject("$group", groupFilters); 

    pipeline.add(match); 
    pipeline.add(unwind); 
    pipeline.add(match2); 
    pipeline.add(group); 

    cmdBody.put("pipeline", pipeline); 



    CommandResult res = getCollection().getDB().command(cmdBody); 
    System.out.println(res);

哪個輸出

{ "result" : [ { "_id" : null , "indexTokens" : [ "Paris"]}] , "ok" : 1.0}

這是天才！

非常感謝！

來源

2012-09-06 azpublic

你可以用2.2聚合框架來做到這一點。像這樣的東西;

db.books.runCommand("aggregate", { 
    pipeline: [ 
     { // find docs that contain Par* 
      $match: { "indexTokens" : { "$regex" : "^Par" , "$options" : "i"}}, 
     }, 
     { // create a doc with a single array elemm for each indexToken entry 
      $unwind: "$indexTokens" 
     }, 
     { // now produce a list of index tokens 
      $group: { 
       _id: "$indexTokens", 
      }, 
     }, 
    ], 
})

或者，如果你真的想要數組沒有文檔，這可能更接近你後面的內容;從捲雲的響應

db.books.runCommand("aggregate", { 
    pipeline: [ 
     { // find docs that contain Par* 
      $match: { "indexTokens" : { "$regex" : "^Par" , "$options" : "i"}}, 
     }, 
     { // create a doc with a single array elemm for each indexToken entry 
      $unwind: "$indexTokens" 
     }, 
     { // now throw out any unwind's that DON'T contain Par* 
      $match: { "indexTokens": { "$regex": "^Par", "$options": "i" } }, 
     }, 
     { // now produce the list of index tokens 
      $group: { 
       _id: null, 
       indexTokens: { $push: "$indexTokens" }, 
      }, 
     }, 
    ], 
})

來源

2012-09-06 10:43:42 cirrus

您可以將其作爲第二個解決方案添加到您的原始答案中。這樣，人們不會爲什麼會有兩個答案:) – Sammaye

好的。做到這一點.. – cirrus

感謝你們兩個人，它像一個魅力。我添加了一個答案來顯示我是如何在JAVA中完成的（我沒有最新的驅動程序，所以我不能在DBCollection上使用aggregate（）方法。 – azpublic

大廈，我建議做$unwind第一，避免冗餘$match。例如：

db.books.aggregate(
    {$unwind:"$indexTokens"}, 
    {$match:{indexTokens:/^Par/}}, 
    {$group:{_id:null,indexTokens:{$push:"$indexTokens"}} 
})

您如何在Java中做到這一點？您可以使用MongoDB v2.9.0驅動程序的DBCollection.aggregate(...)方法。每個管道運營商，例如。 $unwind或$match，對應於DBObject對象。

來源

2012-09-06 13:51:58 slee

實際上，我不認爲$ match是多餘的。 $ unwind就是它必須在RAM中創建一大批文檔，並且希望儘早減少這些文檔。第一個$匹配確保我們只處理文檔，甚至在文檔中包含Par * indexTokens在我們解開它們之前，第二個$匹配然後將其設置爲我們想要的那個，記住，你想早點得到你的$匹配以減少管線體積 – cirrus

你是對的，匹配文檔，展開數組，然後再次匹配以清除與正則表達式不匹配的文檔。 – slee

MongoDB的聚合：如何返回數組的唯一匹配的元素

回答

相關問題