C *的樂趣建模時間表

，我周圍看到所有建議的C *方案使用或多或少相同的建模技術來更好地瞭解的C。問題在於我對以這種方式建模Twitter時間軸的可擴展性感到懷疑。

的問題：如果我有一個用戶A（搖滾歌星） 會發生什麼或更多，是非常受歡迎的，隨後是10K +用戶？每次用戶A發佈推文時，我們都必須爲他的每個追隨者在時間表中插入10k +推文。

問題： 這個模型真的會縮放嗎？任何人都可以建議我一個可以真正縮放的時間線建模的替代方法嗎？

C *架構：

CREATE TABLE users (
uname text, -- UserA 
followers set, -- Users who follow userA 
following set, -- UserA is following userX 
PRIMARY KEY (uname) 
); 
-- View of tweets created by user 
CREATE TABLE userline (
tweetid timeuuid, 
uname text, 
body text, 
PRIMARY KEY(uname, tweetid) 
); 
-- View of tweets created by user, and users he/she follows 
CREATE TABLE timeline (
uname text, 
tweetid timeuuid, 
posted_by text, 
body text, 
PRIMARY KEY(uname, tweetid) 
); 


-- Example of UserA posting a tweet: 
-- BATCH START 
-- Store the tweet in the tweets 
INSERT INTO tweets (tweetid, uname, body) VALUES (now(), 'userA', 'Test tweet #1'); 

-- Store the tweet in this users userline 
INSERT INTO userline (uname, tweetid, body) VALUES ('userA', now(), 'Test tweet #1'); 

-- Store the tweet in this users timeline 
INSERT INTO timeline (uname, tweetid, posted_by, body) VALUES ('userA', now(), 'userA', 'Test tweet #1'); 

-- Store the tweet in the public timeline 
INSERT INTO timeline (uname, tweetid, posted_by, body) VALUES ('#PUBLIC', now(), 'userA', 'Test tweet #1'); 

-- Insert the tweet into follower timelines 
-- findUserFollowers = SELECT followers FROM users WHERE uname = 'userA'; 
for (String follower : findUserFollowers('userA')) { 
INSERT INTO timeline (uname, tweetid, posted_by, body) VALUES (follower, now(), 'userA', 'Test tweet #1'); 
} 
-- BATCH END

在此先感謝您的任何建議。

來源

2014-02-07 syepes

在我看來，您概述的架構或類似的架構最好給出用例（請參閱最新的推文用戶X訂閱+查看我的推文）。

但是有兩個問題。

我不認爲Twitter使用卡桑德拉存儲推文，可能出於同樣的原因，你開始思考。 Feed在Cassandra上運行似乎不是一個好主意，因爲您不想永遠堅持這些無數次的其他人的推文，而是要爲每個用戶更新一些推拉窗口（大多數用戶不會從Feed的頂部讀取1000條推文，我猜測）。所以我們正在討論一個隊列，並且在某些情況下實時更新隊列。卡桑德拉只能通過一些強制手段來支持這種模式。我不認爲它是爲大規模流失而設計的。

在生產過程中，可能會選擇更好的支持隊列的數據庫 - 可能就像分區Redis支持列表支持。
對於您給出的示例，問題並不像看起來那麼糟糕，因爲您不需要在同步批處理中執行此更新。您可以發佈到作者的列表，然後快速返回，然後使用羣集中運行的異步工作人員通過盡力而爲的QoS推出更新來執行所有其他更新。

最後，既然你問到的替代品，這裏是我能想到的變化。它可能在概念上更接近我所提到的隊列，但是在後面會遇到很多與重度數據流失相關的問題。

CREATE TABLE users(
uname text, 
mru_timeline_slot int, 
followers set, 
following set, 
PRIMARY KEY (uname) 
); 

// circular buffer: keep at most X slots for every user. 
CREATE TABLE timeline_most_recent(
uname text, 
timeline_slot int, 
tweeted timeuuid, 
posted_by text, 
body text, 
PRIMARY KEY(uname, timeline_slot) 
);

來源

2014-02-19 04:01:50

C *的樂趣建模時間表

回答

相關問題