2016-05-26 44 views
0

以下是我的基於3個表的左連接獲取字段的查詢。我的要求是基於表Debug.T中最近的SystemDateTime獲取所有字段。例如,如果我嘗試它爲HardwareId = 550803413,它返回2個不同SystemDateTime的記錄。我需要對它進行過濾,以便根據最近的SystemDateTime獲取所有HardwareIds的1條記錄。數據存儲在Google Big Query中。基於使用左連接加入的3個表進行分組

任何幫助,將不勝感激。

SELECT HardwareId, e.Carrier, max(d.SystemDateTime) as DateTime, 
CASE 
    WHEN lower(DebugData) LIKE 'veri%' THEN 'Verizon' 
    WHEN REGEXP_MATCH(lower(DebugData),'\\d+') THEN c.Network 
END 
AS ActualData 
FROM (
SELECT 
HardwareId, SystemDateTime, max(SystemDateTime) as max_date, 
INTEGER(RTRIM(SUBSTR(REGEXP_REPLACE(REGEXP_REPLACE(DebugData,'\\"',' '), '\\?',' ') ,0,3))) AS d1, 
INTEGER(RTRIM(SUBSTR(REGEXP_REPLACE(DebugData,'[^a-zA-Z0-9]',' '),4,LENGTH(DebugData)-3))) AS d2 
FROM TABLE_DATE_RANGE([Debug.T],TIMESTAMP('2016-05-16'),TIMESTAMP('2016-05-16')) 
GROUP BY HardwareId, DebugReason, DebugData, SystemDateTime 
HAVING DebugReason = 31) AS d 
LEFT JOIN 
(
    SELECT Mcc, Mnc as Mnc, Network from [Debug.Carrier] 
) As c 
ON c.Mcc = d.d1 and c.Mnc = d.d2 
INNER JOIN 
(
    SELECT VehicleId, APNCarrier FROM [Info_20160516] 
) As e 
ON d.HardwareId = e.VehicleId 
GROUP BY HardwareId, ActualData, e.Carrier 
HAVING HardwareId = 550803413 

電流輸出:

HardwareId DebugReason DebugData e_APNCarrier DateTime ActualDebugData 
550473814 50013 23430"? Unknown 2016-05-16 08:09:09.534597 Everyth. Ev.wh./T-Mobile 
550473814 50013 23410"? Unknown 2016-05-16 07:50:48.526288 O2 Ltd. 
550473814 50013 23415"? Unknown 2016-05-16 23:54:37.487154 Vodafone 

預期輸出:

近期以來SystemDateTime爲23:54:37.487154,查詢應該過濾基於最近SystemDateTime記錄,並提供結果。

HardwareId DebugReason DebugData e_APNCarrier DateTime ActualDebugData 
550473814 50013 23415"? Unknown 2016-05-16 23:54:37.487154 Vodafone 
+1

請包括db架構,數據示例和預期輸出。 \t請閱讀[**如何提問**](http://stackoverflow.com/help/how-to-ask) \t \t這裏是[** START **]( http://spaghettidba.com/2015/04/24/how-to-post-at-sql-question-on-a-public-forum/)瞭解如何提高您的問題質量並獲得更好的答案。 –

+0

謝謝,編輯了這個問題。 – user3447653

+0

在你的別名'd'中看不到列'd1,d2',你如何使用'd.d1和d.d2'連接c? –

回答

0

所以你只是想根據DateTime根據HardwareId最新的記錄?試試這個:

SELECT * FROM (
SELECT HardwareId, e.Carrier, d.SystemDateTime as DateTime, 
CASE 
    WHEN lower(DebugData) LIKE 'veri%' THEN 'Verizon' 
    WHEN REGEXP_MATCH(lower(DebugData),'\\d+') THEN c.Network 
END 
AS ActualData, 
ROW_NUMBER() OVER (PARTITION BY HARDWAREID ORDER BY d.SystemDateTime desc) RN 
FROM (
SELECT 
HardwareId, SystemDateTime, max(SystemDateTime) as max_date, 
INTEGER(RTRIM(SUBSTR(REGEXP_REPLACE(REGEXP_REPLACE(DebugData,'\\"',' '), '\\?',' ') ,0,3))) AS d1, 
INTEGER(RTRIM(SUBSTR(REGEXP_REPLACE(DebugData,'[^a-zA-Z0-9]',' '),4,LENGTH(DebugData)-3))) AS d2 
FROM TABLE_DATE_RANGE([Debug.T],TIMESTAMP('2016-05-16'),TIMESTAMP('2016-05-16')) 
GROUP BY HardwareId, DebugReason, DebugData, SystemDateTime 
HAVING DebugReason = 31) AS d 
LEFT JOIN 
(
    SELECT Mcc, Mnc as Mnc, Network from [Debug.Carrier] 
) As c 
ON c.Mcc = d.d1 and c.Mnc = d.d2 
INNER JOIN 
(
    SELECT VehicleId, APNCarrier FROM [Info_20160516] 
) As e 
ON d.HardwareId = e.VehicleId 
HAVING HardwareId = 550803413 
) 
WHERE RN = 1 
+0

我得到錯誤 - 字段HardwareId沒有發現任何一方的連接。 – user3447653

+0

剛注意到我在拼寫'ROW_NUMBER()'中有一個拼寫錯誤。在上面的代碼中修復了它。但僅供參考 - 我只是複製並粘貼您的查詢作爲內部查詢,在SystimeDateTime字段中刪除'group by','max',並添加以下列以對數據進行排序ROW_NUMBER()OVER(PARTITION BY HARDWAREID ORDER BY d.SystemDateTime desc)RN'。 – mo2

+1

是的,我注意到了。謝謝,工作非常好。 – user3447653