鑑於您的示例數據,我認爲以下幾點應該可以做到。
SELECT *
FROM testtable
QUALIFY
(
event_type = 'Event A'
AND
min(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = 'Event B'
) OR
(
event_type = 'Event B'
AND
max(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'Event A'
)
這裏我們使用Window函數來測試結果集中記錄之前和之後的記錄。我們在QUALIFY子句中這樣做,就像WHERE子句一樣,但是對於窗口函數。
打破這個資格聲明:
event_type = 'Event A'
AND
min(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = 'Event B'
是說:「如果這個當前記錄是‘事件A’,當按時間戳此ID下令第二天記錄‘事件B’則允許記錄」。
event_type = 'Event B'
AND
max(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'Event A'
是說:「如果這個當前記錄‘事件B’,當按時間戳此ID訂購了此前的紀錄是‘事件A’則允許記錄。
可能需要得到在更多的創造QUALIFY條款搭上邊的情況,但一旦你換你的頭周圍一切是如何運作的,你可以在那裏得到相當的創意
例:
CREATE MULTISET VOLATILE TABLE testtable
(
id int,
ts varchar(20),
location varchar(20),
event_type varchar(20)
) PRIMARY INDEX (id) ON COMMIT PRESERVE ROWS;
INSERT INTO testtable VALUES (1111,'20160601-0112','Detroit','Event A');
INSERT INTO testtable VALUES (1111,'20160602-0954','Brooklyn','Event B');
INSERT INTO testtable VALUES (1111,'20160602-1123','Brooklyn','Event A');
INSERT INTO testtable VALUES (1112,'20160912-1420','Minneapolis','Event B');
INSERT INTO testtable VALUES (1113,'20161123-1742','New Orleans','Event A');
INSERT INTO testtable VALUES (1113,'20161124-1841','New Orleans','Event A');
INSERT INTO testtable VALUES (1113,'20161124-2100','New Orleans','Event B');
INSERT INTO testtable VALUES (1114,'20170201-0959','Detroit','Event A');
INSERT INTO testtable VALUES (1114,'20170201-2350','Detroit','Event A');
SELECT *
FROM testtable
QUALIFY
(
event_type = 'Event A'
AND
min(event_type) OVER (PARTITION BY id ORDER BY ts ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = 'Event B'
) OR
(
event_type = 'Event B'
AND
max(event_type) OVER (PARTITION BY id ORDER BY ts ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'Event A'
);
+------+---------------+-------------+------------+
| id | ts | location | event_type |
+------+---------------+-------------+------------+
| 1111 | 20160601-0112 | Detroit | Event A |
| 1111 | 20160602-0954 | Brooklyn | Event B |
| 1113 | 20161124-1841 | New Orleans | Event A |
| 1113 | 20161124-2100 | New Orleans | Event B |
+------+---------------+-------------+------------+
每個ID可能有*多個*'B',然後您只需要第一個? – dnoeth
是的,理論上是這樣,但我可以查詢數據並創建一個只有第一個事件B記錄(和所有事件A記錄)的臨時表,如果它更容易。 – slim88
爲什麼ID 1112不在列表中?是因爲它缺少事件A嗎? – JNevill