2

比方說,我在一到多個表的城市和人,分別得到了以下數據:最有效的方法來選擇一個行中的一個:許多對錶在MySQL

SELECT city.*, person.* FROM city, person WHERE city.city_id = person.person_city_id; 
+---------+-------------+-----------+-------------+----------------+ 
| city_id | city_name | person_id | person_name | person_city_id | 
+---------+-------------+-----------+-------------+----------------+ 
|  1 | chicago  |   1 | charles  |    1 | 
|  1 | chicago  |   2 | celia  |    1 | 
|  1 | chicago  |   3 | curtis  |    1 | 
|  1 | chicago  |   4 | chauncey |    1 | 
|  2 | new york |   5 | nathan  |    2 | 
|  3 | los angeles |   6 | luke  |    3 | 
|  3 | los angeles |   7 | louise  |    3 | 
|  3 | los angeles |   8 | lucy  |    3 | 
|  3 | los angeles |   9 | larry  |    3 | 
+---------+-------------+-----------+-------------+----------------+ 
9 rows in set (0.00 sec) 

而且我想用一些特定的邏輯從每個獨特城市的人員中選擇一條記錄。例如:

SELECT city.*, person.* FROM city, person WHERE city.city_id = person.person_city_id 
GROUP BY city_id ORDER BY person_name DESC 
; 

這裏的含義是,每個城市內,我想要得到的lexigraphically最大的價值,如:

+---------+-------------+-----------+-------------+----------------+ 
| city_id | city_name | person_id | person_name | person_city_id | 
+---------+-------------+-----------+-------------+----------------+ 
|  2 | new york |   5 | nathan  |    2 | 
|  3 | los angeles |   6 | luke  |    3 | 
|  1 | chicago  |   1 | curtis  |    1 | 
+---------+-------------+-----------+-------------+----------------+ 

實際輸出我得到的,卻是:

+---------+-------------+-----------+-------------+----------------+ 
| city_id | city_name | person_id | person_name | person_city_id | 
+---------+-------------+-----------+-------------+----------------+ 
|  2 | new york |   5 | nathan  |    2 | 
|  3 | los angeles |   6 | luke  |    3 | 
|  1 | chicago  |   1 | charles  |    1 | 
+---------+-------------+-----------+-------------+----------------+ 

據我所知,造成這種差異的原因是MySQL首先執行GR​​OUP BY,然後執行ORDER BY。這對我來說是不幸的,因爲我希望GROUP BY有選擇邏輯來選擇記錄。

我可以使用一些嵌套的SELECT語句解決此:

SELECT c.*, p.* FROM city c, 
    (SELECT p_inner.* FROM 
     (SELECT * FROM person ORDER BY person_city_id, person_name DESC) p_inner 
     GROUP BY person_city_id) p 
    WHERE c.city_id = p.person_city_id; 
+---------+-------------+-----------+-------------+----------------+ 
| city_id | city_name | person_id | person_name | person_city_id | 
+---------+-------------+-----------+-------------+----------------+ 
|  1 | chicago  |   3 | curtis  |    1 | 
|  2 | new york |   5 | nathan  |    2 | 
|  3 | los angeles |   6 | luke  |    3 | 
+---------+-------------+-----------+-------------+----------------+ 

這似乎是當person表擴大任意大這將是非常低效的。我假設內部的SELECT語句不知道最外層的WHERE過濾器。這是真的?

什麼是最好的方法做什麼有效的是之前 GROUP BY?

回答

1

執行此操作的常用方法(在MySQL中)是將表與您自己的表連接起來。

首先要獲得最大的person_namecity(即每在personperson_city_id):

SELECT p.* 
FROM person p 
LEFT JOIN person p2 
ON p.person_city_id = p2.person_city_id 
AND p.person_name < p2.person_name 
WHERE p2.person_name IS NULL 

此加入person自身各person_city_id(你GROUP BY變量)內,並且還對錶了這樣即p2person_name大於pperson_name

因爲它是一個左連接,如果有一個p.person_name對此有沒有更大p2.person_name(即同一城市內),那麼p2.person_nameNULL。這些正是每個城市的「最大」person_name s。

所以加入您的其他信息(從city)吧,剛做了另外加入:

SELECT c.*,p.* 
FROM person p 
LEFT JOIN person p2 
ON p.person_city_id = p2.person_city_id 
AND p.person_name < p2.person_name 
LEFT JOIN city c       -- add in city table 
ON p.person_city_id = c.city_id   -- add in city table 
WHERE p2.person_name IS NULL    -- ORDER BY c.city_id if you like 
+0

如果有一個過濾器,比如'WHERE city_id = 3',這個過濾器是否可以讓連接更加高效?或者應該在'person_city_id,person_name'上爲'person'添加一個索引? – user655321 2012-02-06 00:49:08

+0

我認爲如果將它添加到連接條件中,會使連接更加高效,但在「WHERE」子句中卻沒有這麼多。重新索引,希望有人在這裏更多的mysql索引富可以幫助 - 我知道左連接方法被廣泛接受爲比子查詢方法(對於「最大的每個組」)更有效,但是當它到達索引的實際時我一點都不瞭解。 – 2012-02-06 01:05:25

+0

這很好,因爲它不會限制我只選擇*最大值(例如'person_name') - 我可以得到與該最大值相關的整行。它比擁有一堆嵌套的SELECTS更簡單,並且已經考慮了一段時間,我相信這**不會受益於額外的WHERE子句。 – user655321 2012-02-06 05:20:55

0

你的「解決方案」是不是有效的SQL,但它在MySQL工作。但是,如果在查詢優化器代碼中將來發生更改,它將無法確定。這可能會稍微改良成僅有1級嵌套(還沒有有效的SQL)的:

--- Option 1 --- 
SELECT 
     c.* 
    , p.* 
FROM 
     city AS c 
    JOIN 
     (SELECT * 
     FROM person 
     ORDER BY person_city_id 
       , person_name DESC 
    ) AS p 
    ON c.city_id = p.person_city_id 
GROUP BY p.person_city_id 

另一種方式(有效的SQL語法,工作在其他DBMS,太)是使一個子查詢選擇每一個城市,然後姓加入:

--- Option 2 --- 
SELECT 
     c.* 
    , p.* 
FROM 
     city AS c 
    JOIN 
     (SELECT person_city_id 
      , MAX(person_name) AS person_name 
     FROM person 
     GROUP BY person_city_id 
    ) AS pmax 
    ON c.city_id = pmax.person_city_id 
    JOIN 
     person AS p 
    ON p.person_city_id = pmax.person_city_id 
    AND p.person_name = pmax.person_name 

另一種方式是自聯接(表person的),與@mathematical_coffee描述<伎倆。

--- Option 3 --- 
    see @mathematical-coffee's answer 

另一種方法是使用一個LIMIT 1子查詢的cityperson聯接:

--- Option 4 --- 
SELECT 
     c.* 
    , p.* 
FROM 
     city AS c 
    JOIN 
     person AS p 
    ON 
     p.person_id = 
     (SELECT person_id 
     FROM person AS pm 
     WHERE pm.person_city_id = c.city_id 
     ORDER BY person_name DESC 
     LIMIT 1 
    ) 

這將運行一個子查詢(上表person)對每一個城市,它會如果您有InnoDB引擎的(person_city_id, person_name)索引或MyISAM引擎的(person_city_id, person_name, person_id),則效率很高。


有這些選項之間的一個重要區別:

Oprions 2和3將返回所有束縛的結果(如果你在具有相同名稱的一個城市兩個或兩個以上的人是按字母順序最後,那麼這兩個或全部將被顯示)。

選項1和4將返回每個城市的一個結果,即使有關係。您可以通過更改ORDER BY子句來選擇哪一個。


哪種選擇更有效也取決於你的數據的分佈,所以最好的辦法是嘗試所有這些,檢查他們的執行計劃,並發現對於每一個工作的最佳指標。對於任何這些查詢,(person_city_id, person_name)上的索引很可能是有用的。

隨着分銷我的意思是:

  • 你有每個城市的許多人幾個城市? (我認爲選項2和4在這種情況下表現會更好)

  • 還是許多城市每個城市人少? (對於這些數據,選項3可能會更好)。

相關問題