2016-04-19 42 views
0

的BigQuery - 新手BigQuery的reddit的評論數據分析

試圖獲得對用戶誰雙雙評論前10 subreddits和他們所使用的BigQuery reddit的數據

評論共同subreddits的計數

我剛剛開始使用BQ,也是SQL的初學者,我發現很難獲得此查詢。有人可以給我一些指示,開始?

+1

正如菲利普指出(隱含的) - 開始的最佳辦法就是做什麼你到目前爲止 - 所以我們可以縮小我們的努力來幫助你。否則,它太寬泛,很難跳進 –

+0

如果答案有助於解決您的問題,您應該考慮接受它 –

回答

2

從來沒有真正的需要在玩下面的reddit數據,只是爲了拋出至少一些東西給你開始,因爲似乎沒有人願意。

快速邏輯:

Step - 1: Identify top 10 most commented subreddits 

SELECT subreddit 
FROM [fh-bigquery:reddit_comments.subr_rank_201505] 
ORDER BY comments 
DESC LIMIT 10 

步驟 - 2:對於每一個版(Subreddit)鑑定[固體]的用戶(具有多於50條評論)


SELECT author, subreddit, COUNT(1) AS comments 
FROM [fh-bigquery:reddit_comments.2016_01] 
WHERE subreddit IN (
    SELECT subreddit 
    FROM [fh-bigquery:reddit_comments.subr_rank_201505] 
    ORDER BY comments DESC 
    LIMIT 10) 
AND author NOT IN ('AutoModerator', '[deleted]') 
GROUP BY author, subreddit 
HAVING comments > 50 

步驟 - 3:對於每個subreddit標識一對普通用戶(通過JOIN) 步驟 - 4:最後,每對用戶的計算共同subreddits數


SELECT usera, userb, COUNT(1) AS subreddits 
FROM (
    SELECT 
    a.author AS usera, 
    b.author AS userb, 
    a.subreddit AS subreddit, 
    FROM (
    SELECT author, subreddit, COUNT(1) AS comments FROM [fh-bigquery:reddit_comments.2016_01] 
    WHERE subreddit IN (SELECT subreddit FROM [fh-bigquery:reddit_comments.subr_rank_201505] ORDER BY comments DESC LIMIT 10) 
    AND author NOT IN ('AutoModerator', '[deleted]') 
    GROUP BY author, subreddit HAVING comments > 50) AS a 
    JOIN (
    SELECT author, subreddit, COUNT(1) AS comments FROM [fh-bigquery:reddit_comments.2016_01] 
    WHERE subreddit IN (SELECT subreddit FROM [fh-bigquery:reddit_comments.subr_rank_201505] ORDER BY comments DESC LIMIT 10) 
    AND author NOT IN ('AutoModerator', '[deleted]') 
    GROUP BY author, subreddit HAVING comments > 50) AS b 
    ON a.subreddit = b.subreddit 
    WHERE a.author < b.author 
) 
GROUP BY usera, userb 
HAVING subreddits > 3 
ORDER BY subreddits DESC, usera, userb 

希望這有助於