0
的BigQuery - 新手BigQuery的reddit的評論數據分析
試圖獲得對用戶誰雙雙評論前10 subreddits和他們所使用的BigQuery reddit的數據
評論共同subreddits的計數我剛剛開始使用BQ,也是SQL的初學者,我發現很難獲得此查詢。有人可以給我一些指示,開始?
的BigQuery - 新手BigQuery的reddit的評論數據分析
試圖獲得對用戶誰雙雙評論前10 subreddits和他們所使用的BigQuery reddit的數據
評論共同subreddits的計數我剛剛開始使用BQ,也是SQL的初學者,我發現很難獲得此查詢。有人可以給我一些指示,開始?
從來沒有真正的需要在玩下面的reddit數據,只是爲了拋出至少一些東西給你開始,因爲似乎沒有人願意。
快速邏輯:
Step - 1: Identify top 10 most commented subreddits
SELECT subreddit
FROM [fh-bigquery:reddit_comments.subr_rank_201505]
ORDER BY comments
DESC LIMIT 10
步驟 - 2:對於每一個版(Subreddit)鑑定[固體]的用戶(具有多於50條評論)
SELECT author, subreddit, COUNT(1) AS comments
FROM [fh-bigquery:reddit_comments.2016_01]
WHERE subreddit IN (
SELECT subreddit
FROM [fh-bigquery:reddit_comments.subr_rank_201505]
ORDER BY comments DESC
LIMIT 10)
AND author NOT IN ('AutoModerator', '[deleted]')
GROUP BY author, subreddit
HAVING comments > 50
步驟 - 3:對於每個subreddit標識一對普通用戶(通過JOIN) 步驟 - 4:最後,每對用戶的計算共同subreddits數
SELECT usera, userb, COUNT(1) AS subreddits
FROM (
SELECT
a.author AS usera,
b.author AS userb,
a.subreddit AS subreddit,
FROM (
SELECT author, subreddit, COUNT(1) AS comments FROM [fh-bigquery:reddit_comments.2016_01]
WHERE subreddit IN (SELECT subreddit FROM [fh-bigquery:reddit_comments.subr_rank_201505] ORDER BY comments DESC LIMIT 10)
AND author NOT IN ('AutoModerator', '[deleted]')
GROUP BY author, subreddit HAVING comments > 50) AS a
JOIN (
SELECT author, subreddit, COUNT(1) AS comments FROM [fh-bigquery:reddit_comments.2016_01]
WHERE subreddit IN (SELECT subreddit FROM [fh-bigquery:reddit_comments.subr_rank_201505] ORDER BY comments DESC LIMIT 10)
AND author NOT IN ('AutoModerator', '[deleted]')
GROUP BY author, subreddit HAVING comments > 50) AS b
ON a.subreddit = b.subreddit
WHERE a.author < b.author
)
GROUP BY usera, userb
HAVING subreddits > 3
ORDER BY subreddits DESC, usera, userb
希望這有助於
正如菲利普指出(隱含的) - 開始的最佳辦法就是做什麼你到目前爲止 - 所以我們可以縮小我們的努力來幫助你。否則,它太寬泛,很難跳進 –
如果答案有助於解決您的問題,您應該考慮接受它 –