2017-05-10 66 views
0

針對BigQuery表,我試圖運行調用UDF的SQL語句。該語句在Python腳本中執行,並且通過BigQuery API進行調用。如何使用BigQuery API使用調用UDF的Python腳本

當我執行一個沒有UDF的簡單SQL語句時,它工作正常。但是,當我嘗試使用UDF腳本(本地存儲或存儲在GCS存儲桶中)時,我總是收到相同的錯誤。 這是我得到我的本地終端(我運行通過Python啓動腳本):

Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/googleapiclient/http.py", line 840, in execute raise HttpError(resp, content, uri=self.uri) googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/[projectId]/queries?alt=json returned "Required parameter is missing">

這是我的Python腳本:

credentials = SignedJwtAssertionCredentials(
SERVICE_ACCOUNT_EMAIL, 
key, 
scope='https://www.googleapis.com/auth/bigquery') 

aservice = build('bigquery','v2',credentials=credentials) 
query_requestb = aservice.jobs() 

query_data = { 
    'configuration': { 
     'query': { 
      'userDefinedFunctionResources': [ 
       { 
        'resourceUri': 'gs://[bucketName]/[fileName].js' 
       } 
      ], 
      'query': sql 
     } 
    }, 
    'timeoutMs': 100000 
} 

query_response = query_requestb.query(projectId=PROJECT_NUMBER,body=query_data).execute(num_retries=0) 

任何想法「參數丟失」或如何我可以得到這個運行?

回答

2

替代指定userDefinedFunctionResources,請在'query'正文中使用CREATE TEMP FUNCTION,並將庫引用爲OPTIONS子句的一部分。爲此,您需要使用standard SQL,還可以參考user-defined functions上的文檔。您的查詢會是這個樣子:

#standardSQL 
CREATE TEMP FUNCTION MyJsFunction(x FLOAT64) RETURNS FLOAT64 LANGUAGE js AS """ 
    return my_js_function(x); 
""" 
OPTIONS (library='gs://[bucketName]/[fileName].js'); 

SELECT MyJsFunction(x) 
FROM MyTable; 
+0

感謝Elliott的快速反應。使用'標準SQL'工作,並通過'OPTIONS'子句在兩個獨立的UDF文件中調用我的函數。請注意,我必須在JS函數的IF語句中小寫'if',因爲它是*區分大小寫的(反之,如果您使用具有'傳統SQL'的UDF)。此解決方案有效。但仍然渴望瞭解我們如何將它與'遺留SQL'結合使用。 – kekchoze

0

我想跑的查詢是通過營銷渠道,我通常使用UDF進行歸類流量和銷售。這是我使用standard SQL運行的查詢。此查詢存儲在我讀一個文件,並存儲在變量sql

CREATE TEMPORARY FUNCTION 
    mktchannels(source STRING, 
    medium STRING, 
    campaign STRING) 
    RETURNS STRING 
    LANGUAGE js AS """ 
return channelGrouping(source,medium,campaign) // where channelGrouping is the function in my channelgrouping.js file which contains the attribution rules 
    """ OPTIONS (library=["gs://[bucket]/[path]/regex.js", 
    "gs://[bucket]/[path]/channelgrouping.js"]); 
WITH 
    traffic AS (// select fields from the BigQuery table 
    SELECT 
    device.deviceCategory AS device, 
    trafficSource.source AS source, 
    trafficSource.medium AS medium, 
    trafficSource.campaign AS campaign, 
    SUM(totals.visits) AS sessions, 
    SUM(totals.transactionRevenue)/1e6 as revenue, 
    SUM(totals.transactions) as transactions 
    FROM 
    `[datasetId].[table]` 
    GROUP BY 
    device, 
    source, 
    medium, 
    campaign) 
SELECT 
    mktchannels(source, 
    medium, 
    campaign) AS channel, // call the temp function set above 
    device, 
    SUM(sessions) AS sessions, 
    SUM(transactions) as transactions, 
    ROUND(SUM(revenue),2) as revenue 
FROM 
    traffic 
GROUP BY 
    device, 
    channel 
ORDER BY 
    channel, 
    device; 

然後在Python腳本:

fd = file('myquery.sql', 'r') 
sql = fd.read() 
fd.close() 

query_data = { 
    'query': sql, 
    'maximumBillingTier': 10, 
    'useLegacySql': False, 
    'timeoutMs': 300000 
} 

希望這有助於人的未來!