如何爲Twitter文章聚合器設計MongoDB架構

我是MongoDB的新手，並且作爲練習，我正在構建一個從推文中提取鏈接的應用程序。這個想法是爲某個主題獲得最多的推文。我很難爲這個應用程序設計架構。如何爲Twitter文章聚合器設計MongoDB架構

應用收穫鳴叫，拯救他們
的鳴叫被解析的鏈接
的鏈接保存附加信息（標題，摘要等）
鳴叫可以包含一個以上鍊接
一個鏈接可以有很多的鳴叫

我如何：

保存這些集合，嵌入式文檔？
獲取排名前十的鏈接，他們有推文的數量？
獲取針對特定日期的推文鏈接最多的鏈接。
獲取推文鏈接？
獲取十條最新推文？

我想獲得一些輸入。

2011-07-30 Sven

兩個一般提示： 1.）不要害怕重複。在不同的集合中存儲格式不同的相同數據通常是一個好主意。 2）如果你想排序和總結的東西，它有助於保持計數字段無處不在。 mongodb的原子更新方法和upsert命令一起使得它易於計數並向現有文檔添加字段。

以下是肯定有缺陷，因爲它是從我的頭頂打出的。不過還好不好的例子不是沒有例子我想，）

colletion tweets: 

{ 
    tweetid: 123, 
    timeTweeted: 123123234, //exact time in milliseconds 
    dayInMillis: 123412343, //the day of the tweet kl 00:00:00 
    text: 'a tweet with a http://lin.k and an http://u.rl', 
    links: [ 
    'http://lin.k', 
    'http://u.rl' 
    ], 
    linkCount: 2 
} 

collection links: 

{ 
    url: 'http://lin.k' 
    totalCount: 17, 
    daycounts: { 
     1232345543354: 5, //key: the day of the tweet kl 00:00:00 
     1234123423442: 2, 
     1234354534535: 10 
    } 
}

添加新的鳴叫：

db.x.tweets.insert({...}) //simply insert new document with all fields 

//for each found link: 
var upsert = true; 
var toFind = { url: '...'}; 
var updateObj = {'$inc': {'totalCount': 1, 'daycounts.12342342': 1 } }; //12342342 is the day of the tweet 
db.x.links.update(toFind, updateObj, upsert);

獲得十大鏈接排序通過他們的tweets數量？

db.x.links.find().sort({'totalCount:-1'}).limit(10);

獲取針對特定日期的推文最多的鏈接？

db.x.links.find({'$gt':{'daycount.123413453':0}}).sort({'daycount.123413453':-1}).limit(1); //123413453 is the day you're after

獲取推文鏈接？

db.x.tweets.find({'links': 'http://lin.k'});

獲取十條最新推文？

db.x.tweets.find().sort({'timeTweeted': -1}, -1).limit(10);

來源

2011-07-30 15:02:25 rompetroll

當習慣了這種情況時，很難不規範化。你的例子非常有用，非常感謝你！ :) – Sven

如何爲Twitter文章聚合器設計MongoDB架構

回答

相關問題