2015-11-25 51 views
0

目前,由於此question的回答者的幫助,我能夠成功查詢單詞,並獲得最受歡迎的後續單詞列表。例如,用的是「偉大」,我能夠得到如下格式的多達10個字的清單:在帶有多個輸入的trigrams上構造BigQuery

SELECT second, SUM(cell.page_count) total 
FROM [publicdata:samples.trigrams] 
WHERE first = "great" 
group by 1 
order by 2 desc 
limit 10 

隨着輸出:

second  total  
------------------ 
deal  3048832 
and  1689911 
,   1576341 
a   1019511 
number  984993  
many  875974  
importance 805215  
part  739409  
.   700694  
as   628978 

什麼我目前遇到麻煩搞清楚如何是如何做到這一點查詢自動多個單詞(而不是調用每次一個單獨的詞的查詢),這樣我可能有一個輸出,如:

"great"  total  "new_word_1"   new_total_1 ... "new_word_N"  new_total_N 
----------------------------------------------------------------------------------------- 
deal  3048832 "new_follow_on_word1" 123456  ... "follow_on_N1" 234567 
and  1689911 "new_follow_on_word2" 12345  ... "follow_on_N2" 123456 

基本上我可以在單個查詢中調用N字數(例如,new_word_1是一個完全不同的單詞,如「棒球」,沒有與「偉大」的關係),並獲取與每個單詞相關的總計數在不同的列上。

此外,在瞭解了BigQuery的pricing之後,我也無法弄清楚如何儘可能限制查詢的總數據。我可以考慮只使用最新的數據(比如2010年以後)和每字2個字母數字輸出,但可能會丟失更明顯的限制器。

回答

1

您可以在同一個查詢中放置多個第一個單詞,但它需要分別計算前10個後續單詞,然後將結果連接在一起。這裏是「偉大」和「棒球」的例子

SELECT word1, total1, word2, total2 FROM 
(SELECT ROW_NUMBER() OVER() rowid1, word1, total1 FROM (
SELECT second as word1, SUM(cell.page_count) total1 
FROM [publicdata:samples.trigrams] 
WHERE first = "great" 
group by 1 
order by 2 desc 
limit 10)) a1 
JOIN 
(SELECT ROW_NUMBER() OVER() rowid2, word2, total2 FROM (
SELECT second as word2, SUM(cell.page_count) total2 
FROM [publicdata:samples.trigrams] 
WHERE first = "baseball" 
group by 1 
order by 2 desc 
limit 10)) a2 
ON a1.rowid1 = a2.rowid2 
+0

我剛剛刪除了我的答案,因爲我意識到我錯過了前10名的要求 –