2015-11-02 115 views
0

編輯: 經過一些更多的故障排除之後,我發現了以下內容 - 我將lag(event_time)添加到查詢中,查看查詢收集的日期,結果:Spark SQL窗口函數/ lag()給出了意想不到的結果

SELECT device_id, 
     event_time, 
     unix_time, 
     event_id, 
     lag(event_time) OVER (PARTITION BY device_id ORDER BY unix_time,event_id) AS lag_time 
FROM ios_d_events; 

對於一些device_ids這將返回如預期,但一些device_ids返回如下:

   device_id    |  event_time  | unix_time |    event_id    |   lag_time 
--------------------------------------+---------------------+-----------------+--------------------------------------+---------------------------- 
111111111111111111111111111111111111 | 2015-03-01 10:41:47 | 1425206507.0000 | 4C48BE67-31EB-4432-96EF-0F30B4191340 | 
111111111111111111111111111111111111 | 2015-03-01 10:41:48 | 1425206508.0000 | A4AE33A2-6CDC-480C-A8D7-C4024810D236 | 2015-03-01 10:41:47 
111111111111111111111111111111111111 | 2015-03-01 10:41:51 | 1425206511.0000 | 997614AE-CE7E-45F6-BCE3-93E70CD46609 | 2015-03-01 10:41:48 
111111111111111111111111111111111111 | 2015-03-01 10:41:53 | 1425206513.0000 | 202DA38C-1823-4100-85AB-3DE139FB3CE3 | 2015-03-01 10:41:51 
111111111111111111111111111111111111 | 2015-03-01 10:42:11 | 1425206531.0000 | 8DFA7938-2123-4978-A89D-6C92404B504D | 2015-03-01 10:41:53 
111111111111111111111111111111111111 | 2015-03-01 10:42:12 | 1425206532.0000 | 463E9833-9526-4E76-A907-4651C13991A0 | 2015-03-01 10:42:11 
111111111111111111111111111111111111 | 2015-03-01 10:42:14 | 1425206534.0000 | 8204696E-3DAA-423E-9031-EC80BFA1157E | 2015-03-01 10:42:12 
111111111111111111111111111111111111 | 2015-03-01 10:42:20 | 1425206540.0000 | 10DB02E2-2D4F-4611-98D7-966074FBD398 | 2015-03-01 10:42:14 
111111111111111111111111111111111111 | 2015-03-01 10:42:20 | 1425206540.0000 | 50535667-A02D-47F7-8320-86EAC4638964 | 2015-03-01 10:42:20 
111111111111111111111111111111111111 | 2015-03-01 10:42:27 | 1425206547.0000 | 6C8BC79D-CB3E-4FE2-8EDD-B8421EA237FF | 2015-03-01 10:42:20 
111111111111111111111111111111111111 | 2015-03-01 10:42:27 | 1425206547.0000 | A732E59D-2EEE-44AE-BB5B-D016E6E6C597 | 2015-03-01 10:42:27 
111111111111111111111111111111111111 | 2015-03-01 10:42:27 | 1425206547.0000 | ABBE4184-65C6-41C1-AC0B-74828DDD8DE8 | 2015-03-01 10:42:27 
111111111111111111111111111111111111 | 2015-03-01 10:42:40 | 1425206560.0000 | 03D0B5FF-9E4D-4F14-8169-7D1C93C617B3 | 2015-03-01 10:42:27 
111111111111111111111111111111111111 | 2015-03-01 10:42:40 | 1425206560.0000 | 5C1DFE08-8081-4C84-9D8F-E29EBEB1C9D5 | 2015-03-01 10:42:40 
111111111111111111111111111111111111 | 2015-03-01 10:42:40 | 1425206560.0000 | 82C50F8E-9790-4C27-8979-9484954934B4 | 2015-03-01 10:42:40 
111111111111111111111111111111111111 | 2015-03-01 10:42:42 | 1425206562.0000 | 2D30722E-FB37-4563-B2CD-FA545D95AAB4 | 2015-03-01 10:42:40 
111111111111111111111111111111111111 | 2015-03-01 10:42:49 | 1425206569.0000 | 7613F856-763E-4792-904E-8D89E2502710 | 2015-03-01 10:42:42 
111111111111111111111111111111111111 | 2015-03-01 10:43:01 | 1425206581.0000 | DB39294A-E133-4A05-B367-210944965FA3 | 2015-03-01 10:42:49 
111111111111111111111111111111111111 | 2015-03-01 10:43:02 | 1425206582.0000 | 61B8AE48-D5C8-4809-9C4F-56EEA45E3626 | 2015-03-01 10:43:01 
111111111111111111111111111111111111 | 2015-03-01 10:43:02 | 1425206582.0000 | 82870AB0-08F1-403F-B805-836CC1454D1A | 2015-03-01 10:43:02 
111111111111111111111111111111111111 | 2015-03-01 10:43:04 | 1425206584.0000 | BA24E540-0F5B-4BC4-B59B-29D729FA5C35 | 2015-03-01 10:43:02 
999999999999999999999999999999999999 | 2015-05-13 16:40:19 | 1431535219.0000 | 25E20777-508D-4194-9324-BE8A44CE7B59 | 1970-01-03 09:15:58.430244 
999999999999999999999999999999999999 | 2015-05-13 16:40:19 | 1431535219.0000 | 72DCEE64-CB3A-43CD-A949-FBE35C0A8873 | 1970-01-03 09:15:58.430244 
999999999999999999999999999999999999 | 2015-05-13 16:40:21 | 1431535221.0000 | A0062926-C8A0-47ED-A7F0-65716893DDC0 | 1970-01-03 09:15:58.430244 
999999999999999999999999999999999999 | 2015-05-13 16:40:25 | 1431535225.0000 | 0BCEABD0-DCEB-431C-A6FE-6E1243FC9890 | 1970-01-03 09:15:58.430244 
999999999999999999999999999999999999 | 2015-05-13 16:40:34 | 1431535234.0000 | 8F72E8D1-A167-460B-9034-C4D1423CAF15 | 1970-01-03 09:15:58.430244 
999999999999999999999999999999999999 | 2015-05-13 16:41:19 | 1431535279.0000 | 44C0214C-E13B-4CBB-92BB-4809EAD5DF15 | 1970-01-03 09:15:58.430244 
999999999999999999999999999999999999 | 2015-05-13 16:41:20 | 1431535280.0000 | C8CBB801-E8F3-47FD-B9C9-50BD0DDFA356 | 1970-01-03 09:15:58.430244 
999999999999999999999999999999999999 | 2015-05-13 16:41:22 | 1431535282.0000 | C3173090-8BB8-407F-BFF5-07B58BCFCA7B | 1970-01-03 09:15:58.430244 
999999999999999999999999999999999999 | 2015-05-13 16:41:48 | 1431535308.0000 | 91290A91-2394-4C5F-8687-6A74593F9FFB | 1970-01-03 09:15:58.430244 
999999999999999999999999999999999999 | 2015-05-13 16:41:48 | 1431535308.0000 | EF138C8C-93BC-4C29-8438-EB0214B3CFC4 | 1970-01-03 09:15:58.430244 
999999999999999999999999999999999999 | 2015-05-13 16:41:52 | 1431535312.0000 | 76AA328F-D46D-4235-B780-2B05FCE02855 | 1970-01-03 09:15:58.430244 
999999999999999999999999999999999999 | 2015-05-13 16:41:53 | 1431535313.0000 | 563E04CE-7DDB-4543-8F1D-C81625E839F8 | 1970-01-03 09:15:58.430244 

它看起來像這樣當DEVICE_ID前兩個事件具有相同的時間戳發生。希望能給某人一個線索?


OP

我試圖使用滯後()函數中的星火SQL確定在表中的兩個後續的事件之間的時間長度。重要的列是device_id,文本列,unix_time,數字時間戳和event_id,每行都是唯一的。

我正在查詢:

SELECT device_id, 
     unix_time, 
     event_id, 
     unix_time - lag(unix_time) 
      OVER 
     (PARTITION BY device_id ORDER BY unix_time,event_id) 
     AS seconds_since_last_event 
FROM ios_d_events; 

在Postgres裏,這給了預期的結果 - 但是在Pyspark運行時,任何時候都有相同的時間戳兩個事件,seconds_since_last_event被計算爲大量,即-1435151676846888或-1431583545415023或25534 - 我無法弄清楚這些數字來自哪裏。

我嘗試添加一個if()語句來查詢,因爲

if(unix_time = lag(unix_time) 
    OVER (PARTITION BY device_id ORDER BY unix_time,event_id), 
    0, 
    unix_time - lag(unix_time) 
    OVER (PARTITION BY device_id ORDER BY unix_time,event_id)) 
AS seconds_since_last_event 

但我得到了相同的結果。任何想法可能會造成這種情況?

回答

1

事實證明這是Spark 1.5.0中的一個錯誤 - 該問題並未出現在1.5.1中。

相關問題