2012-02-01 66 views
0

我想寫一個mapreduce函數來從mongodb中累積統計信息。然而..誰創建的數據結構,我的隊友保存的數據如下:MongoDb MapReduce組按鍵NOT值

"statistics": { 
    "20111206": { 
     "CN": { 
     "Beijing": { 
      "cart": 1, 
      "cart_users": [ 
      { "$oid" : "4EDD73938EAD0E5420000000" } 
      ], 
      "downloads": { 
      "wmv": { 
       "mid": 1 
      } 
      }, 
      "orders": { 
      "wmv": { 
       "mid": 1 
      } 
      } 
     } 
     } 
    } 
} 

的問題是,很多價值觀,我需要的只是存儲在例如鍵(如CN或北京組)。這些可以是國家代碼,視頻格式等......所以我不想在mapreduce函數中對這些中的任何一個進行編碼。

我用於減少部分在Foreach功能只傳遞中的值作爲一個參數..

所以,問題是:是否有任何方式來執行由鍵有關此一MapReduce和基團或我必須首先將數據轉換成新的結構看起來或多或少的財產以後這樣的:

{ 
    "movie_id": "4edcd4f29a4e61c00c000059", 
    "country": "CN", 
    "city": "Beijing", 
    "list": [ 
    { 
     "user_id": { "$oid" : "4EDD75388EAD0E5720010000" }, 
     "downloads": { 
     "cnt": 1, 
     "list": [ 
      { 
      "format": "wmv", 
      "quality": "high" 
      } 
     ] 
     }, 
     "orders": { 
     "cnt": 1, 
     "list": [ 
      { 
      "format": "wmv", 
      "quality": "high" 
      } 
     ] 
     } 
    } 
    ] 
} 
+0

你能告訴我們你的'map'和'減少'功能?目前尚不清楚您期望的輸出。 – 2012-02-01 20:02:28

回答

0

說出您的收藏設置了類似下面的記錄:

> db.test_col.findOne() 
{ 
    "_id" : ObjectId("4f90ed994d2246dd7996e042"), 
    "statistics" : { 
     "20111206" : { 
      "CN" : { 
       "Beijing" : { 
        "cart" : 1, 
        "cart_users" : [ 
         { 
          "oid" : "4EDD73938EAD0E5420000000" 
         } 
        ], 
        "downloads" : { 
         "wmv" : { 
          "mid" : 1 
         } 
        }, 
        "orders" : { 
         "wmv" : { 
          "mid" : 1 
         } 
        } 
       } 
      } 
     } 
    } 
} 

下面是一個按國家分組的命令,提供城市列表以及該國家的總數。它應該讓你更接近你 正在 試圖做:

db.runCommand({ mapreduce: "test_col", 
       map: function() { 
        var l0  = this.statistics, 
         date = Object.keySet(l0)[0], 
         l1  = l0[date], 
         country = Object.keySet(l1)[0], 
         l2  = l1[country], 
         city = Object.keySet(l2)[0], 
         data  = l2[city]; 
        emit(country, { date: date, city: city, data: data }); 
       }, 
       reduce: function (country, values) { 
        var r = { cities: [], count: 0 }; 
        values.forEach(function (v) { 
         if (r.cities.indexOf(v.city) == -1)   r.cities.push(v.city); 
         r.count++; 
        }); 
        return r; 
       }, 
       out: { reduce: "test_col_reduce" } 
}); 

爲我的測試數據的輸出是這樣的:

> db.test_col_reduce.find() 
{ "_id" : "AR", "value" : { "cities" : [ "San Juan", "Buenos Aires", "Cordoba", "Rosario" ], "count" : 18 } } 
{ "_id" : "BZ", "value" : { "cities" : [ "Morico", "San Ignacio", "Corozal" ], "count" : 15 } } 
{ "_id" : "CN", "value" : { "cities" : [ "Beijing", "Shanghai", "HongKong" ], "count" : 26 } } 
{ "_id" : "US", "value" : { "cities" : [ "San Diego", "Los Angeles", "San Francisco", "New York" ], "count" : 27 } }