最初,我有一個關係,一個訂單有許多lineitems,許多lineitems只有一個訂單,像往常一樣。SQL到MapReduce:計算多對一關係中的唯一鍵?
使用MongoDB中,我沒有這個文件來代表它:
{
"_id" : ObjectId("511b7d1b3daee1b1446ecdfe"),
"l_order" : {
"_id" : ObjectId("511b7d133daee1b1446eb54d"),
"o_orderkey" : NumberLong(1),
"o_totalprice" : 173665.47,
"o_orderdate" : ISODate("1996-01-02T03:00:00Z"),
"o_orderpriority" : "5-LOW",
"o_shippriority" : 0,
},
"l_linenumber" : 1,
"l_shipdate" : ISODate("1996-03-13T03:00:00Z"),
"l_commitdate" : ISODate("1996-02-12T03:00:00Z"),
"l_receiptdate" : ISODate("1996-03-22T03:00:00Z"),
}
我的本意是翻譯這個sql查詢:
select
o_orderpriority,
count(*) as order_count
from
orders
where
o_orderdate >= date '1993-07-01'
and o_orderdate < date '1993-07-01' + interval '3' month
and exists (
select
*
from
lineitem
where
l_orderkey = o_orderkey
and l_commitdate < l_receiptdate
)
group by
o_orderpriority
order by
o_orderpriority;
對於這個使用兩種MapReduce函數:
第一個
db.runCommand({
mapreduce: "lineitem",
query: {
"l_order.o_orderdate": {'$gte': new Date("July 01, 1993"), '$lt': new Date("Oct 01, 1993")}
},
map: function Map() {
if(this.l_commitdate < this.l_receiptdate){
emit(this.l_order.o_orderkey, this.l_order.o_orderpriority);
}
},
out: 'query004a'
});
二
db.runCommand({
mapreduce: "query004a",
map: function Map() {
/*Remenbering, the value here will be this.l_order.o_orderpriority from the previous mapreduce function*/
emit(this.value, 1);
},
reduce: function(key, values) {
return Array.sum(values);
},
out: 'query004b'
});
在第i個分離的文件片有在時間範圍和尊重該比較,將它們分組爲命令鍵,以避免重複。在第二個我分組的o_orderpriority和總和。
我驚訝的答案是比我期待的更大。但是,爲什麼發生這種情況?