所以你想獲得每組最高的OrderField
?我會做這種方式:
SELECT t1.*
FROM `Table` AS t1
LEFT OUTER JOIN `Table` AS t2
ON t1.GroupId = t2.GroupId AND t1.OrderField < t2.OrderField
WHERE t2.GroupId IS NULL
ORDER BY t1.OrderField; // not needed! (note by Tomas)
(編輯由Tomas:如果有更多的記錄與同組內的相同OrderField,你需要確切地說是其中之一,你可能要擴展的條件編輯的
SELECT t1.*
FROM `Table` AS t1
LEFT OUTER JOIN `Table` AS t2
ON t1.GroupId = t2.GroupId
AND (t1.OrderField < t2.OrderField
OR (t1.OrderField = t2.OrderField AND t1.Id < t2.Id))
WHERE t2.GroupId IS NULL
端)
換句話說,返回沒有其他行t2
具有相同GroupId
存在行t1
和更大。。當t2.*
爲NULL時,表示左外部聯接未找到此匹配項,因此t1
在該組中的值最大爲OrderField
。
沒有排名,沒有子查詢。如果你有一個(GroupId, OrderField)
的複合索引,這應該運行得很快並且通過「使用索引」來優化對t2的訪問。
關於性能,請參閱我的回答Retrieving the last record in each group。我嘗試了使用Stack Overflow數據轉儲的子查詢方法和聯接方法。差異是顯着的:在我的測試中,加入方法的運行速度提高了278倍。
重要的是你有正確的索引以獲得最佳結果!
關於使用@Rank變量的方法,它不會像你寫的那樣工作,因爲@Rank的值在查詢處理完第一個表後不會重置爲零。我會告訴你一個例子。
我插入一些虛擬的數據,一個額外的字段爲空,除了在我們所知道的是每組的最大行:
select * from `Table`;
+---------+------------+------+
| GroupId | OrderField | foo |
+---------+------------+------+
| 10 | 10 | NULL |
| 10 | 20 | NULL |
| 10 | 30 | foo |
| 20 | 40 | NULL |
| 20 | 50 | NULL |
| 20 | 60 | foo |
+---------+------------+------+
我們可以證明,排名上升至三層爲第一組和六爲第二組,和內查詢這些正確返回:
select GroupId, max(Rank) AS MaxRank
from (
select GroupId, @Rank := @Rank + 1 AS Rank
from `Table`
order by OrderField) as t
group by GroupId
+---------+---------+
| GroupId | MaxRank |
+---------+---------+
| 10 | 3 |
| 20 | 6 |
+---------+---------+
現在運行查詢沒有連接條件,迫使所有行的笛卡爾積,我們也獲取所有列:
select s.*, t.*
from (select GroupId, max(Rank) AS MaxRank
from (select GroupId, @Rank := @Rank + 1 AS Rank
from `Table`
order by OrderField
) as t
group by GroupId) as t
join (
select *, @Rank := @Rank + 1 AS Rank
from `Table`
order by OrderField
) as s
-- on t.GroupId = s.GroupId and t.MaxRank = s.Rank
order by OrderField;
+---------+---------+---------+------------+------+------+
| GroupId | MaxRank | GroupId | OrderField | foo | Rank |
+---------+---------+---------+------------+------+------+
| 10 | 3 | 10 | 10 | NULL | 7 |
| 20 | 6 | 10 | 10 | NULL | 7 |
| 10 | 3 | 10 | 20 | NULL | 8 |
| 20 | 6 | 10 | 20 | NULL | 8 |
| 20 | 6 | 10 | 30 | foo | 9 |
| 10 | 3 | 10 | 30 | foo | 9 |
| 10 | 3 | 20 | 40 | NULL | 10 |
| 20 | 6 | 20 | 40 | NULL | 10 |
| 10 | 3 | 20 | 50 | NULL | 11 |
| 20 | 6 | 20 | 50 | NULL | 11 |
| 20 | 6 | 20 | 60 | foo | 12 |
| 10 | 3 | 20 | 60 | foo | 12 |
+---------+---------+---------+------------+------+------+
從上面我們可以看出,每組的最大等級是正確的,但是@Rank繼續增加,因爲它將第二個派生表處理爲7和更高。所以第二個派生表中的等級將永遠不會與第一個派生表中的等級重疊。
您必須添加另一個派生表來強制@Rank在處理兩個表之間重置爲零(並希望優化器不會更改它評估表的順序,否則使用STRAIGHT_JOIN來防止那):
select s.*
from (select GroupId, max(Rank) AS MaxRank
from (select GroupId, @Rank := @Rank + 1 AS Rank
from `Table`
order by OrderField
) as t
group by GroupId) as t
join (select @Rank := 0) r -- RESET @Rank TO ZERO HERE
join (
select *, @Rank := @Rank + 1 AS Rank
from `Table`
order by OrderField
) as s
on t.GroupId = s.GroupId and t.MaxRank = s.Rank
order by OrderField;
+---------+------------+------+------+
| GroupId | OrderField | foo | Rank |
+---------+------------+------+------+
| 10 | 30 | foo | 3 |
| 20 | 60 | foo | 6 |
+---------+------------+------+------+
但是這個查詢的優化是可怕的。它不能使用任何索引,它會創建兩個臨時表,以困難的方式對它們進行排序,甚至使用連接緩衝區,因爲它在連接臨時表時也不能使用索引。這是EXPLAIN
輸出例如:
+----+-------------+------------+--------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+------+---------+------+------+---------------------------------+
| 1 | PRIMARY | <derived4> | system | NULL | NULL | NULL | NULL | 1 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | |
| 1 | PRIMARY | <derived5> | ALL | NULL | NULL | NULL | NULL | 6 | Using where; Using join buffer |
| 5 | DERIVED | Table | ALL | NULL | NULL | NULL | NULL | 6 | Using filesort |
| 4 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 6 | Using temporary; Using filesort |
| 3 | DERIVED | Table | ALL | NULL | NULL | NULL | NULL | 6 | Using filesort |
+----+-------------+------------+--------+---------------+------+---------+------+------+---------------------------------+
而使用左外連接我的解決方案優化了好多了。它不使用臨時表,甚至報告"Using index"
,這意味着它可以僅使用索引解析連接,而不會觸及數據。
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 6 | Using filesort |
| 1 | SIMPLE | t2 | ref | GroupId | GroupId | 5 | test.t1.GroupId | 1 | Using where; Using index |
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------+
您可能會閱讀在他們的博客上聲稱「加入SQL變慢」的人,但這是無稽之談。糟糕的優化會導致SQL變慢。
更高級的問題在這裏http://stackoverflow.com/questions/9841093/how-to-writegreatest-n-per-group-type-query-but-with-additional-conditions/9845109#9845109 – TMS 2012-03-25 10:56:05