我剛剛寫了一個查詢,使用剛發佈的「BigQuery上的完整黑客新聞數據集」。
這裏的問題是,有些評論指向父母的故事,而其他評論張貼到父母的評論,並通過他們搜索原始故事很難(因爲這是一個遞歸操作)。通過解決:
SELECT p0.id, s.id, s.title, level
FROM (
SELECT p0.id, p0.parent, p2.id, p3.id, p4.id, COALESCE(p7.parent, p6.parent, p5.parent, p4.parent, p3.parent, p2.parent, p1.parent, p0.parent) story_id,
GREATEST(IF(p7.parent IS null, -1, 7), IF(p6.parent IS null, -1, 6), IF(p5.parent IS null, -1, 5), IF(p4.parent IS null, -1, 4), IF(p3.parent IS null, -1, 3),
IF(p2.parent IS null, -1, 2), IF(p1.parent IS null, -1, 1), 0) level
FROM [fh-bigquery:hackernews.comments] p0
LEFT JOIN EACH [fh-bigquery:hackernews.comments] p1 ON p1.id=p0.parent
LEFT JOIN EACH [fh-bigquery:hackernews.comments] p2 ON p2.id=p1.parent
LEFT JOIN EACH [fh-bigquery:hackernews.comments] p3 ON p3.id=p2.parent
LEFT JOIN EACH [fh-bigquery:hackernews.comments] p4 ON p4.id=p3.parent
LEFT JOIN EACH [fh-bigquery:hackernews.comments] p5 ON p5.id=p4.parent
LEFT JOIN EACH [fh-bigquery:hackernews.comments] p6 ON p6.id=p5.parent
LEFT JOIN EACH [fh-bigquery:hackernews.comments] p7 ON p7.id=p6.parent
HAVING level=0
LIMIT 100
) a
LEFT JOIN EACH [fh-bigquery:hackernews.stories] s
ON s.id=a.story_id
(有這麼多的左聯接消耗了大量的資源,所以要大規模運行它,我會尋找不同的策略)
https://news.ycombinator.com/item?id=10440502
這需要一個預定義的量深度......可以以某種方式更加一般地考慮每個單獨的深度? – fsociety
由於樹遍歷算法是遞歸的,所以您需要遞歸地運行同一個查詢七次,或者編寫一個查詢,在一次運行中深入七個級別。但是如果你需要超過x個級別,那麼就是遞歸。或者這就是我迄今爲止所瞭解的。 –