2015-04-01 69 views
-1

是否有反覆在谷歌大查詢使用Python腳本運行查詢?使用Python的大查詢

我想查詢一個數據集使用谷歌大查詢平臺爲一週的數據,我想這一年。查詢數據集52次太麻煩了。相反,我寧願寫一個Python腳本(據我所知Python)。

我希望有人能指出我對這個方向是正確的。

+1

[快速互聯網搜索](https://duckduckgo.com/?q=Google%20BigQuery%20Python)顯示,這是可能的...它也在BigQuery主頁上提到Python ...不確定你在這裏混淆/問題是什麼? – Carpetsmoker 2015-04-01 14:02:38

+0

在App Engine上託管的cron job + python代碼?儘管如此,你必須在你的問題上更具體 – Patrice 2015-04-01 14:05:51

回答

2

BigQuery爲多種語言提供客戶端庫 - 請參閱https://cloud.google.com/bigquery/client-libraries - 特別是Python,其文檔爲https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/python/latest/?_ga=1.176926572.834714677.1415848949(您需要遵循超鏈接瞭解文檔)。

https://cloud.google.com/bigquery/bigquery-api-quickstart給出了一個使用Google BigQuery API在其中一個可用樣本數據集上運行查詢並顯示結果的命令行程序示例,使用Java或Python。後進口,設置幾個常數,在Python腳本歸結爲

storage = Storage('bigquery_credentials.dat') 
    credentials = storage.get() 

    if credentials is None or credentials.invalid: 
     # Run oauth2 flow with default arguments. 
     credentials = tools.run_flow(FLOW, storage, tools.argparser.parse_args([])) 

    http = httplib2.Http() 
    http = credentials.authorize(http) 

    bigquery_service = build('bigquery', 'v2', http=http) 

    try: 
    query_request = bigquery_service.jobs() 
    query_data = {'query':'SELECT TOP(title, 10) as title, COUNT(*) as revision_count FROM [publicdata:samples.wikipedia] WHERE wp_namespace = 0;'} 

    query_response = query_request.query(projectId=PROJECT_NUMBER, 
             body=query_data).execute() 
    print 'Query Results:' 
    for row in query_response['rows']: 
     result_row = [] 
     for field in row['f']: 
     result_row.append(field['v']) 
     print ('\t').join(result_row) 

    except HttpError as err: 
    print 'Error:', pprint.pprint(err.content) 

    except AccessTokenRefreshError: 
    print ("Credentials have been revoked or expired, please re-run" 
      "the application to re-authorize") 

正如你看到的,只是30日線,主要涉及獲取和檢查權限和處理錯誤。 「核心」部分,這樣的考慮淨,其實只是一半的臺詞:

bigquery_service = build('bigquery', 'v2', http=http) 
    query_request = bigquery_service.jobs() 
    query_data = {'query':'SELECT TOP(title, 10) as title, COUNT(*) as revision_count FROM [publicdata:samples.wikipedia] WHERE wp_namespace = 0;'} 

    query_response = query_request.query(projectId=PROJECT_NUMBER, 
             body=query_data).execute() 
    print 'Query Results:' 
    for row in query_response['rows']: 
     result_row = [] 
     for field in row['f']: 
     result_row.append(field['v']) 
     print ('\t').join(result_row) 
0

您可以使用谷歌的數據流的蟒蛇,如果它一次性的東西從你的終端或同等運行它。或者你可以在appenginecron中有一個shell腳本,循環訪問代碼52次以獲取數據。谷歌數據流調度。