1
我正在嘗試在github存檔(http://www.githubarchive.org/)數據上使用Google BigQuery數據來獲取最新事件發生時的存儲庫統計信息,而我試圖以最多的觀察者來獲取這個存儲庫。我意識到這是很多,但我覺得我真的接近於在一個查詢中得到它。Google BigQuery:如何爲查詢結果中的值獲取不同的行
這是查詢我現在有:
SELECT repository_name, repository_owner, repository_organization, repository_size, repository_watchers as watchers, repository_forks as forks, repository_language, MAX(PARSE_UTC_USEC(created_at)) as time
FROM [githubarchive:github.timeline]
GROUP EACH BY repository_name, repository_owner, repository_organization, repository_size, watchers, forks, repository_language
ORDER BY watchers DESC, time DESC
LIMIT 1000
唯一的問題是,我得到的是從最高看着庫中的所有事件(Twitter的引導):
結果:
Row repository_name repository_owner repository_organization repository_size watchers forks repository_language time
1 bootstrap twbs twbs 83875 61191 21602 JavaScript 1384991582000000
2 bootstrap twbs twbs 83875 61190 21602 JavaScript 1384991337000000
3 bootstrap twbs twbs 83875 61190 21603 JavaScript 1384989683000000
...
我怎麼才能得到這個返回單個結果(t他最近,又名Max(time))爲一個repository_name?
我已經試過:
SELECT repository_name, repository_owner, repository_organization, repository_size, repository_watchers as watchers, repository_forks as forks, repository_language, MAX(PARSE_UTC_USEC(created_at)) as time
FROM [githubarchive:github.timeline]
WHERE PARSE_UTC_USEC(created_at) IN (SELECT MAX(PARSE_UTC_USEC(created_at)) FROM [githubarchive:github.timeline])
GROUP EACH BY repository_name, repository_owner, repository_organization, repository_size, watchers, forks, repository_language
ORDER BY watchers DESC, time DESC
LIMIT 1000
或不肯定不是,如果這樣做工作,但它並不重要,因爲我得到的錯誤信息:
Error: Join attribute is not defined: PARSE_UTC_USEC
任何幫助將是巨大的,謝謝。
是的,非常感謝。這正是我所需要的。 – brycek
只需對此解決方案發表一條評論。如果存儲庫具有相同的名稱,則會導致問題,所以我添加了:'b.repository_owner = a.repository_owner'。 – brycek