使用Python遍歷所有BigQuery作業

我正在使用Google Python API來處理BigQuery。使用Python遍歷所有BigQuery作業

我試圖使用jobs().list()和jobs().list_next()通過都在我的項目的作業進行分頁。我使用的是發電機下面的代碼：

request = service.jobs().list(projectId=project_id, 
           allUsers=True, 
           stateFilter="done", 
          ) 
           # or maxResults=500) 
           # or maxResults=1000) 
           # or maxResults=64000) 
while request is not None: 
    response = request.execute() 
    for x in response["jobs"]: 
     yield x 
    request = service.jobs().list_next(request, response)

的問題是，這取決於我如何使用maxResults，我得到不同的工作列表。

使用否maxResults自變量我看到9986個作業。
使用maxResults=500我看到8596個工作。
使用maxResults=1000我看到6743個工作。
使用maxResults=64000我看到6743個工作。

我在等待每次作業的數量相同，所以我不確定我是否正確使用了API。

什麼是正確的方式來循環通過項目中的所有工作？

（更新週三8月14日十五時30分29秒CDT 2013）

仍在試圖弄清楚這一點。我運行代碼@Michael Manoochehri友情提供三次，使用不同的maxResults。有關作業數量的各種信息，每一次報告，以及它們如何相互關聯的低於：

s1 -> no maxResults 
s2 -> maxResults=500 
s3 -> maxResults=1000 

|s1| -> 10112 
|s2| -> 8579 
|s3| -> 6556 

|s1 intersection s2| -> 8578 
|s2 difference s1| -> 1 
|s1 difference s2| -> 1534 

|s1 intersection s3| -> 6556 
|s3 difference s1| -> 0 
|s1 difference s3| -> 3556 

|s3 intersection s2| -> 6398 
|s2 difference s3| -> 2181 
|s3 difference s2| -> 158

我還是不能讓我爲什麼沒有看到就業崗位一致總數的感覺不管使用maxResults。

來源

2013-07-25 Eric Kamm

首先，[bigquery_client.py Python模塊] [1]是訪問來自Python中API的好方法，它建立在與另外的錯誤處理，尋呼等的原始客戶機IIb的頂部：

我不確定您是否正確使用頁面令牌？你能證實你正在檢查nextPageToken嗎？這裏有一個我以前用過的例子：

import httplib2 
import pprint 
import sys 

from apiclient.discovery import build 
from apiclient.errors import HttpError 

from oauth2client.client import AccessTokenRefreshError 
from oauth2client.client import OAuth2WebServerFlow 
from oauth2client.client import flow_from_clientsecrets 
from oauth2client.file import Storage 
from oauth2client.tools import run 


# Enter your Google Developer Project number 
PROJECT_NUMBER = 'XXXXXXXXXXXXX' 

FLOW = flow_from_clientsecrets('client_secrets.json', 
           scope='https://www.googleapis.com/auth/bigquery') 



def main(): 

    storage = Storage('bigquery_credentials.dat') 
    credentials = storage.get() 

    if credentials is None or credentials.invalid: 
    credentials = run(FLOW, storage) 

    http = httplib2.Http() 
    http = credentials.authorize(http) 

    bigquery_service = build('bigquery', 'v2', http=http) 
    jobs = bigquery_service.jobs() 

    page_token=None 
    count=0 

    while True: 
    response = list_jobs_page(jobs, page_token) 
    if response['jobs'] is not None: 
     for job in response['jobs']: 
     count += 1 
     print '%d. %s\t%s\t%s' % (count, 
            job['jobReference']['jobId'], 
            job['state'], 
            job['errorResult']['reason'] if job.get('errorResult') else '') 
    if response.get('nextPageToken'): 
     page_token = response['nextPageToken'] 
    else: 
     break 




def list_jobs_page(jobs, page_token=None): 
    try: 
    jobs_list = jobs.list(projectId=PROJECT_NUMBER, 
          projection='minimal', 
          allUsers=True, 
        # You can set a custom maxResults 
          # here 
          # maxResults=500, 
          pageToken=page_token).execute() 

    return jobs_list 

    except HttpError as err: 
    print 'Error:', pprint.pprint(err.content) 



if __name__ == '__main__': 
    main() 


    [1]: https://code.google.com/p/google-bigquery-tools/source/browse/bq/bigquery_client.py#1078

來源

2013-07-26 20:59:33

我想我認爲list_next（）方法會爲我打理pageToken，儘管我沒有找到它的文檔。我會按照你所展示的方式嘗試它。謝謝！ –

我已經用maxResults的不同值試過了你的代碼，而且我仍然看到不同的作業總數，這取決於我發送的maxResults值。實際上，沒有maxResults，我現在看到數百個作業（如預期的那樣它已經有好幾天了，我們一直在查詢），但是maxResults = 500和maxResults = 1000，我看到的工作和7月25日時一樣。這是一個錯誤嗎？ –

使用Python遍歷所有BigQuery作業

回答

相關問題