2015-11-25 56 views
0

我使用的BigQuery對出口GA數據(見模式here不希望扁平化存在的

望着文檔,我看到,當我選擇一個領域,是一個記錄裏就會自動展開該記錄和複製周圍的列。

所以我試圖創建一個非規範化的表,我可以在多個SQL查詢一樣的心態

SELECT 
    CONCAT(date, " ", if (hits.hour < 10, 
     CONCAT("0", STRING(hits.hour)), 
     STRING(hits.hour)), ":", IF(hits.minute < 10, CONCAT("0", STRING(hits.minute)), STRING(hits.minute))) AS hits.date__STRING, 
    CONCAT(fullVisitorId, STRING(visitId)) AS session_id__STRING, 
    fullVisitorId AS google_identity__STRING, 
    MAX(IF(hits.customDimensions.index=7, hits.customDimensions.value,NULL)) WITHIN RECORD AS customer_id__LONG, 
    hits.hitNumber AS hit_number__INT, 
    hits.type AS hit_type__STRING, 
    hits.isInteraction AS hit_is_interaction__BOOLEAN, 
    hits.isEntrance AS hit_is_entrance__BOOLEAN, 
    hits.isExit AS hit_is_exit__BOOLEAN, 
    hits.promotion.promoId AS promotion_id__STRING, 
    hits.promotion.promoName AS promotion_name__STRING, 
    hits.promotion.promoCreative AS promotion_creative__STRING, 
    hits.promotion.promoPosition AS promotion_position__STRING, 
    hits.eventInfo.eventCategory AS event_category__STRING, 
    hits.eventInfo.eventAction AS event_action__STRING, 
    hits.eventInfo.eventLabel AS event_label__STRING, 
    hits.eventInfo.eventValue AS event_value__INT, 
    device.language AS device_language__STRING, 
    device.screenResolution AS device_resolution__STRING, 
    device.deviceCategory AS device_category__STRING, 
    device.operatingSystem AS device_os__STRING, 
    geoNetwork.country AS geo_country__STRING, 
    geoNetwork.region AS geo_region__STRING, 
    hits.page.searchKeyword AS hit_search_keyword__STRING, 
    hits.page.searchCategory AS hits_search_category__STRING, 
    hits.page.pageTitle AS hits_page_title__STRING, 
    hits.page.pagePath AS page_path__STRING, 
    hits.page.hostname AS page_hostname__STRING, 
    hits.eCommerceAction.action_type AS commerce_action_type__INT, 
    hits.eCommerceAction.step AS commerce_action_step__INT, 
    hits.eCommerceAction.option AS commerce_action_option__STRING, 
    hits.product.productSKU AS product_sku__STRING, 
    hits.product.v2ProductName AS product_name__STRING, 
    hits.product.productRevenue AS product_revenue__INT, 
    hits.product.productPrice AS product_price__INT, 
    hits.product.productQuantity AS product_quantity__INT, 
    hits.product.productRefundAmount AS hits.product.product_refund_amount__INT, 
    hits.product.v2ProductCategory AS product_category__STRING, 
    hits.transaction.transactionId AS transaction_id__STRING, 
    hits.transaction.transactionCoupon AS transaction_coupon__STRING, 
    hits.transaction.transactionRevenue AS transaction_revenue__INT, 
    hits.transaction.transactionTax AS transaction_tax__INT, 
    hits.transaction.transactionShipping AS transaction_shipping__INT, 
    hits.transaction.affiliation AS transaction_affiliation__STRING, 
    hits.appInfo.screenName AS app_current_name__STRING, 
    hits.appInfo.screenDepth AS app_screen_depth__INT, 
    hits.appInfo.landingScreenName AS app_landing_screen__STRING, 
    hits.appInfo.exitScreenName AS app_exit_screen__STRING, 
    hits.exceptionInfo.description AS exception_description__STRING, 
    hits.exceptionInfo.isFatal AS exception_is_fatal__BOOLEAN 
FROM 
    [98513938.ga_sessions_20151112] 
HAVING 
    customer_id__LONG IS NOT NULL 
    AND customer_id__LONG != 'NA' 
    AND customer_id__LONG != '' 

我寫了這個表的結果到另一個表非正規化(扁平化的,大型數據集上)。

我得到不同的結果,當我查詢非正規化與條款

WHERE session_id_STRING = "100001897901013346771447300813" 

與在包裝上面的查詢(它產生預期的效果)

SELECT * FROM (_above query_) as foo where session_id_STRING = 100001897901013346771447300813 

我敢肯定,這是設計,但是如果有人能解釋這兩種方法之間的差異,那將會非常有幫助?

回答

0

我相信你說你沒有勾選「拼合結果」當您創建的輸出表?我從你的問題中推斷session_id_STRING是一個重複的字段?

如果這些都是正確的假設,那麼你看到的正是你從文檔上面提到的行爲。你問的BigQuery「平坦結果」,所以它把你的多次實地到非重複的領域和重複它周圍的所有領域,使你有一個平坦的(即不重複的數據)表。

如果所需的行爲是你看到的查詢在子查詢時,則創建表時,你應該取消那個箱子的一個。

+0

sessionId只是'CONCAT(fullVisitorId,STRING(visitId))AS session_id__STRING',fullVisitor或visitId都不重複 – djoanes

0

望着文檔,我看到,當我選擇一個字段 是一個記錄裏就會自動展開該記錄和複製 周圍列。

這是不正確的。順便說一句,你能指點文件 - 它需要改進。

選擇一個字段不壓平這個紀錄。所以,如果你有一個表T有一個記錄{A = 1,B =(2,2,3)},然後做

SELECT * FROM T WHERE b = 2 

你仍然可以得到一個記錄{A = 1,B =( 2,2)}。 SELECT COUNT(a)from this subquery will return 1.

但是,一旦你用flatten = on寫出了這個查詢的結果,就會得到兩條記錄:{a = 1,b = 2},{a = 1,b = 2}。 SELECT COUNT(a)from flattened table will return 2.

+0

對不起,我的意思是當你遇到重複的字段'']',它是平坦的,我描述的不只是嵌套對象'{}'。 – djoanes

+0

在我的答覆,b是重複場,沒有嵌套。在您顯示的SELECT語句中不會發生展平。因爲您選擇使用flatten = on選項編寫查詢結果,所以只會發生拼合。如果不希望展平,只需取消選中此選項。 – Michael