我已經從一些渠道獲得溫度採樣的表隨着時間的推移,我想找到的最小,最大和平均溫度在所有數據源在設定的時間間隔。乍一看,這是很容易,像這樣做:自聯接,交叉聯接和分組
SELECT MIN(temp), MAX(temp), AVG(temp) FROM samples GROUP BY time;
然而,事情變得更加複雜(給我難倒點在哪裏!)如果源下降進出而非期間忽略丟失的來源有問題的間隔我想使用來源的最後知道的溫度爲缺失的樣本。使用日期時間和建設的時間間隔(比如每分鐘)跨分佈不均隨着時間的推移進一步樣品複雜的事情。
我認爲應該可以通過在樣本表上進行自聯接來創建結果,其中第一個表的時間大於或等於第二個表的時間,然後計算聚合值對於按源分組的行。然而,我很難理解如何真正做到這一點。
這裏是我的測試表:
+------+------+------+
| time | source | temp |
+------+------+------+
| 1 | a | 20 |
| 1 | b | 18 |
| 1 | c | 23 |
| 2 | b | 21 |
| 2 | c | 20 |
| 2 | a | 18 |
| 3 | a | 16 |
| 3 | c | 13 |
| 4 | c | 15 |
| 4 | a | 4 |
| 4 | b | 31 |
| 5 | b | 10 |
| 5 | c | 16 |
| 5 | a | 22 |
| 6 | a | 18 |
| 6 | b | 17 |
| 7 | a | 20 |
| 7 | b | 19 |
+------+------+------+
INSERT INTO samples (time, source, temp) VALUES (1, 'a', 20), (1, 'b', 18), (1, 'c', 23), (2, 'b', 21), (2, 'c', 20), (2, 'a', 18), (3, 'a', 16), (3, 'c', 13), (4, 'c', 15), (4, 'a', 4), (4, 'b', 31), (5, 'b', 10), (5, 'c', 16), (5, 'a', 22), (6, 'a', 18), (6, 'b', 17), (7, 'a', 20), (7, 'b', 19);
要盡我的最大,最小和平均計算,我想在中間表看起來像這樣:
+------+------+------+
| time | source | temp |
+------+------+------+
| 1 | a | 20 |
| 1 | b | 18 |
| 1 | c | 23 |
| 2 | b | 21 |
| 2 | c | 20 |
| 2 | a | 18 |
| 3 | a | 16 |
| 3 | b | 21 |
| 3 | c | 13 |
| 4 | c | 15 |
| 4 | a | 4 |
| 4 | b | 31 |
| 5 | b | 10 |
| 5 | c | 16 |
| 5 | a | 22 |
| 6 | a | 18 |
| 6 | b | 17 |
| 6 | c | 16 |
| 7 | a | 20 |
| 7 | b | 19 |
| 7 | c | 16 |
+------+------+------+
下面的查詢讓我靠近我想要什麼,但它需要源的第一個結果的溫度值,而不是在給定的時間間隔最近的一個:
SELECT s.dt as sdt, s.mac, ss.temp, MAX(ss.dt) as maxdt FROM (SELECT DISTINCT dt FROM samples) AS s CROSS JOIN samples AS ss WHERE s.dt >= ss.dt GROUP BY sdt, mac HAVING maxdt <= s.dt ORDER BY sdt ASC, maxdt ASC;
+------+------+------+-------+
| sdt | mac | temp | maxdt |
+------+------+------+-------+
| 1 | a | 20 | 1 |
| 1 | c | 23 | 1 |
| 1 | b | 18 | 1 |
| 2 | a | 20 | 2 |
| 2 | c | 23 | 2 |
| 2 | b | 18 | 2 |
| 3 | b | 18 | 2 |
| 3 | a | 20 | 3 |
| 3 | c | 23 | 3 |
| 4 | a | 20 | 4 |
| 4 | c | 23 | 4 |
| 4 | b | 18 | 4 |
| 5 | a | 20 | 5 |
| 5 | c | 23 | 5 |
| 5 | b | 18 | 5 |
| 6 | c | 23 | 5 |
| 6 | a | 20 | 6 |
| 6 | b | 18 | 6 |
| 7 | c | 23 | 5 |
| 7 | b | 18 | 7 |
| 7 | a | 20 | 7 |
+------+------+------+-------+
更新:(!偉大的名字,順便說一句) chadhoc給出了一個很好的解決方案,遺憾的是沒有在MySQL的工作,因爲它不支持他所使用的FULL JOIN
。幸運的是,我相信一個簡單的UNION
是一種有效的替代:
-- Unify the original samples with the missing values that we've calculated
(
SELECT time, source, temp
FROM samples
)
UNION
(-- Pull all the time/source combinations that we are missing from the sample set, along with the temp
-- from the last sampled interval for the same time/source combination if we do not have one
SELECT a.time, a.source, (SELECT t2.temp FROM samples AS t2 WHERE t2.time < a.time AND t2.source = a.source ORDER BY t2.time DESC LIMIT 1) AS temp
FROM
(-- All values we want to get should be a cross of time/temp
SELECT t1.time, s1.source
FROM
(SELECT DISTINCT time FROM samples) AS t1
CROSS JOIN
(SELECT DISTINCT source FROM samples) AS s1
) AS a
LEFT JOIN samples s
ON a.time = s.time
AND a.source = s.source
WHERE s.source IS NULL
)
ORDER BY time, source;
更新2:的MySQL提供了以下EXPLAIN
輸出chadhoc代碼:
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------------------+
| 1 | PRIMARY | temp | ALL | NULL | NULL | NULL | NULL | 18 | |
| 2 | UNION | <derived4> | ALL | NULL | NULL | NULL | NULL | 21 | |
| 2 | UNION | s | ALL | NULL | NULL | NULL | NULL | 18 | Using where |
| 4 | DERIVED | <derived6> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 4 | DERIVED | <derived5> | ALL | NULL | NULL | NULL | NULL | 7 | |
| 6 | DERIVED | temp | ALL | NULL | NULL | NULL | NULL | 18 | Using temporary |
| 5 | DERIVED | temp | ALL | NULL | NULL | NULL | NULL | 18 | Using temporary |
| 3 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 18 | Using where; Using filesort |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | Using filesort |
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------------------+
我能得到查爾斯的代碼工作像這樣:
SELECT T.time, S.source,
COALESCE(
D.temp,
(
SELECT temp FROM samples
WHERE source = S.source AND time = (
SELECT MAX(time)
FROM samples
WHERE
source = S.source
AND time < T.time
)
)
) AS temp
FROM (SELECT DISTINCT time FROM samples) AS T
CROSS JOIN (SELECT DISTINCT source FROM samples) AS S
LEFT JOIN samples AS D
ON D.source = S.source AND D.time = T.time
它的解釋是:
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------+
| 1 | PRIMARY | <derived5> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 7 | |
| 1 | PRIMARY | D | ALL | NULL | NULL | NULL | NULL | 18 | |
| 5 | DERIVED | temp | ALL | NULL | NULL | NULL | NULL | 18 | Using temporary |
| 4 | DERIVED | temp | ALL | NULL | NULL | NULL | NULL | 18 | Using temporary |
| 2 | DEPENDENT SUBQUERY | temp | ALL | NULL | NULL | NULL | NULL | 18 | Using where |
| 3 | DEPENDENT SUBQUERY | temp | ALL | NULL | NULL | NULL | NULL | 18 | Using where |
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------+
謝謝,查爾斯,但您的解決方案假定所有來源都提前知道。當他們不知道時你有什麼建議嗎? – pr1001 2009-11-11 22:28:29
如果您不知道源文件,則添加另一個sql查詢... – 2009-11-12 00:50:57
將IsNull更改爲COALESCE後,我能夠使查詢在我的MySQL數據庫上工作。謝謝。 – pr1001 2009-11-12 01:24:02