比方說,我在一到多個表的城市和人,分別得到了以下數據:最有效的方法來選擇一個行中的一個:許多對錶在MySQL
SELECT city.*, person.* FROM city, person WHERE city.city_id = person.person_city_id;
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 1 | chicago | 1 | charles | 1 |
| 1 | chicago | 2 | celia | 1 |
| 1 | chicago | 3 | curtis | 1 |
| 1 | chicago | 4 | chauncey | 1 |
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
| 3 | los angeles | 7 | louise | 3 |
| 3 | los angeles | 8 | lucy | 3 |
| 3 | los angeles | 9 | larry | 3 |
+---------+-------------+-----------+-------------+----------------+
9 rows in set (0.00 sec)
而且我想用一些特定的邏輯從每個獨特城市的人員中選擇一條記錄。例如:
SELECT city.*, person.* FROM city, person WHERE city.city_id = person.person_city_id
GROUP BY city_id ORDER BY person_name DESC
;
這裏的含義是,每個城市內,我想要得到的lexigraphically最大的價值,如:
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
| 1 | chicago | 1 | curtis | 1 |
+---------+-------------+-----------+-------------+----------------+
實際輸出我得到的,卻是:
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
| 1 | chicago | 1 | charles | 1 |
+---------+-------------+-----------+-------------+----------------+
據我所知,造成這種差異的原因是MySQL首先執行GROUP BY,然後執行ORDER BY。這對我來說是不幸的,因爲我希望GROUP BY有選擇邏輯來選擇記錄。
我可以使用一些嵌套的SELECT語句解決此:
SELECT c.*, p.* FROM city c,
(SELECT p_inner.* FROM
(SELECT * FROM person ORDER BY person_city_id, person_name DESC) p_inner
GROUP BY person_city_id) p
WHERE c.city_id = p.person_city_id;
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 1 | chicago | 3 | curtis | 1 |
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
+---------+-------------+-----------+-------------+----------------+
這似乎是當person
表擴大任意大這將是非常低效的。我假設內部的SELECT語句不知道最外層的WHERE過濾器。這是真的?
什麼是最好的方法做什麼有效的是之前 GROUP BY?
如果有一個過濾器,比如'WHERE city_id = 3',這個過濾器是否可以讓連接更加高效?或者應該在'person_city_id,person_name'上爲'person'添加一個索引? – user655321 2012-02-06 00:49:08
我認爲如果將它添加到連接條件中,會使連接更加高效,但在「WHERE」子句中卻沒有這麼多。重新索引,希望有人在這裏更多的mysql索引富可以幫助 - 我知道左連接方法被廣泛接受爲比子查詢方法(對於「最大的每個組」)更有效,但是當它到達索引的實際時我一點都不瞭解。 – 2012-02-06 01:05:25
這很好,因爲它不會限制我只選擇*最大值(例如'person_name') - 我可以得到與該最大值相關的整行。它比擁有一堆嵌套的SELECTS更簡單,並且已經考慮了一段時間,我相信這**不會受益於額外的WHERE子句。 – user655321 2012-02-06 05:20:55