2016-03-04 173 views
3

我目前正在處理BigQuery中的數據,然後導出到Excel中以執行最終的數據透視表,並希望能夠使用BigQuery中的PIVOT選項創建相同的數據透視表。BigQuery樞軸數據行列

我在大查詢組數據看起來像

Transaction_Month || ConsumerId || CUST_createdMonth 
01/01/2015  || 1   || 01/01/2015 
01/01/2015  || 1   || 01/01/2015 
01/02/2015  || 1   || 01/01/2015 
01/01/2015  || 2   || 01/01/2015 
01/02/2015  || 3   || 01/02/2015 
01/02/2015  || 4   || 01/02/2015 
01/02/2015  || 5   || 01/02/2015 
01/03/2015  || 5   || 01/02/2015 
01/03/2015  || 6   || 01/03/2015 
01/04/2015  || 6   || 01/03/2015 
01/06/2015  || 6   || 01/03/2015 
01/03/2015  || 7   || 01/03/2015 
01/04/2015  || 8   || 01/04/2015 
01/05/2015  || 8   || 01/04/2015 
01/04/2015  || 9   || 01/04/2015 

它本質上是與客戶的附加信息的順序表。

當我把這個數據到Excel我將其添加到透視表,我添加CUST_createdMonth作爲行,Transaction_Month作爲列,值是的ConsumerID

一個重複計數的輸出如下 enter image description here

在BigQuery中可以使用這種支點嗎?

回答

3

有BigQuery中這樣做沒有很好的方法,但你可以做到這一點遵循以下思路

步驟1

下面的查詢運行

SELECT 'SELECT CUST_createdMonth, ' + 
    GROUP_CONCAT_UNQUOTED(
     'EXACT_COUNT_DISTINCT(IF(Transaction_Month = "' + Transaction_Month + '", ConsumerId, NULL)) as [m_' + REPLACE(Transaction_Month, '/', '_') + ']' 
    ) 
    + ' FROM yourTable GROUP BY CUST_createdMonth ORDER BY CUST_createdMonth' 
FROM (
    SELECT Transaction_Month 
    FROM yourTable 
    GROUP BY Transaction_Month 
    ORDER BY Transaction_Month 
) 

結果 - 你會得到像下面的字符串(爲便於閱讀,下面的格式)

SELECT 
    CUST_createdMonth, 
    EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/01/2015", ConsumerId, NULL)) AS [m_01_01_2015], 
    EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/02/2015", ConsumerId, NULL)) AS [m_01_02_2015], 
    EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/03/2015", ConsumerId, NULL)) AS [m_01_03_2015], 
    EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/04/2015", ConsumerId, NULL)) AS [m_01_04_2015], 
    EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/05/2015", ConsumerId, NULL)) AS [m_01_05_2015], 
    EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/06/2015", ConsumerId, NULL)) AS [m_01_06_2015] 
    FROM yourTable 
GROUP BY 
    CUST_createdMonth 
ORDER BY 
    CUST_createdMonth 

步驟2

只需運行如上構成查詢

結果將是LIKË下面

CUST_createdMonth m_01_01_2015 m_01_02_2015 m_01_03_2015 m_01_04_2015 m_01_05_2015 m_01_06_2015  
01/01/2015   2    1    0    0    0    0  
01/02/2015   0    3    1    0    0    0  
01/03/2015   0    0    2    1    0    1  
01/04/2015   0    0    0    2    1    0 

如果您有太多的時間來進行太多的手動工作,第一步會很有幫助。
在這種情況下 - 第1步幫助您生成查詢

你可以看到更多關於我的其他職位旋轉。

How to scale Pivoting in BigQuery?
請注意 - 有每個表的10K列的限制 - 所以你用10K組織的限制。
您還可以看到下面爲簡化實施例(如果上述一個太複雜/詳細):
How to transpose rows to columns with large amount of the data in BigQuery/SQL?
How to create dummy variable columns for thousands of categories in Google BigQuery?
Pivot Repeated fields in BigQuery

+0

1.我的答案中的代碼在你的問題的例子後面定製,日期顯然是字符串。 2.檢查你的實際數據是否與你提供的例子相同。 3.如果仍然有問題排除故障並修復你的問題 - 顯示產生錯誤的行 - 更好的3行(前一個和後一個) –

+0

嗨我刪除了我的評論,我想我昨天看這個東西太久了,它當我今天嘗試時完美地工作。感謝您的全力幫助 –

+0

很高興您的工作順利! –

1

實際上米哈伊爾還有另一種方式,以轉置的EAV型模式的行轉換成列通過使用日誌表和查詢最後一個CREATE TABLE條目來確定最新的表模式。

 CREATE TEMP FUNCTION jsonSchemaStringToArray(jsonSchema String) 
       RETURNS ARRAY<STRING> AS ((
       SELECT 
        SPLIT(
        REGEXP_REPLACE(REPLACE(LTRIM(jsonSchema,'{ '),'"fields": [',''), r'{[^{]+"name": "([^\"]+)"[^}]+}[, ]*', '\\1,') 
        ,',') 
      )); 
     WITH valid_schema_columns AS (
      WITH array_output aS (SELECT 
      jsonSchemaStringToArray(jsonSchema) AS column_names 
      FROM (
      SELECT 
       protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.schemaJson AS jsonSchema 
       , ROW_NUMBER() OVER (ORDER BY metadata.timestamp DESC) AS record_count 
      FROM `realself-main.bigquery_logging.cloudaudit_googleapis_com_data_access_20170101` 
      WHERE 
       protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.tableId = '<table_name>' 
       AND 
       protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.datasetId = '<schema_name>' 
       AND 
       protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.createDisposition = 'CREATE_IF_NEEDED' 
     ) AS t 
      WHERE 
      t.record_count = 1 -- grab the latest entry 
     ) 
      -- this is actually what UNNESTS the array into standard rows 
      SELECT 
      valid_column_name 
      FROM array_output 
      LEFT JOIN UNNEST(column_names) AS valid_column_name 

     )