2012-03-20 181 views
121

我想運行此查詢:PostgreSQL的DISTINCT與不同的ORDER BY

SELECT DISTINCT ON (address_id) purchases.address_id, purchases.* 
FROM purchases 
WHERE purchases.product_id = 1 
ORDER BY purchases.purchased_at DESC 

但我得到這個錯誤:

PG::Error: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions

添加address_id爲第一ORDER BY表達沉默的錯誤,但我真的不想在address_id上添加排序。沒有address_id訂購可以嗎?

+0

您的訂單子句purchased_at不address_id.Can你讓你的問題清楚。 – Teja 2012-03-20 22:01:46

+0

我的訂單有購買,因爲我想要它,但postgres還要求地址(請參閱錯誤消息)。 – 2012-03-20 22:03:50

+0

完全解答在這裏 - http://stackoverflow.com/questions/9796078/selecting-rows-ordered-by-some-column-and-disctincton-another 感謝http://stackoverflow.com/users/ 268273/mosty-mostacho – 2012-12-21 23:40:39

回答

114

文件說:

DISTINCT ON (expression [, ...]) keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).

Official documentation

所以你必須給address_id添加到由順序。

或者,如果您正在查找包含每個address_id的最新購買產品的整行,並且結果按purchased_at排序,那麼您正試圖解決最大的每組問題,可以通過以下方法:

一般的解決方案,應該在大多數DBMS的工作:

SELECT t1.* FROM purchases t1 
JOIN (
    SELECT address_id, max(purchased_at) max_purchased_at 
    FROM purchases 
    WHERE product_id = 1 
    GROUP BY address_id 
) t2 
ON t1.address_id = t2.address_id AND t1.purchased_at = t2.max_purchased_at 
ORDER BY t1.purchased_at DESC 

更加面向PostgreSQL的解決方案基於香港小輪@的回答:

SELECT * FROM (
    SELECT DISTINCT ON (address_id) * 
    FROM purchases 
    WHERE product_id = 1 
    ORDER BY address_id, purchased_at DESC 
) t 
ORDER BY purchased_at DESC 

問題澄清,擴展並在這裏解決:Selecting rows ordered by some column and distinct on another

+36

它的工作原理,但給出錯誤的順序。這就是爲什麼我想擺脫address_id順序條款 – 2012-03-20 22:12:11

+0

文檔是明確的:你不能因爲選定的行將是不可預知的 – 2012-03-20 22:12:55

+2

但是可能有另一種方法來選擇最新的購買disticnt地址? – 2012-03-20 22:19:17

47

您可以通過address_id在子查詢中進行排序,然後按照您希望在外部查詢中進行排序。

SELECT * FROM 
    (SELECT DISTINCT ON (address_id) purchases.address_id, purchases.* 
    FROM "purchases" 
    WHERE "purchases"."product_id" = 1 ORDER BY address_id DESC) 
ORDER BY purchased_at DESC 
+2

但是,這會比只是一個查詢慢,不是? – 2012-03-20 22:05:34

+2

非常微弱的是。雖然你在原始的'select'中有購買。*,我不認爲這是生產代碼? – hkf 2012-03-20 22:06:14

+7

我會補充說,新版本的postgres你需要別名子查詢。例如:SELECT * FROM(SELECT DISTINCT ON(address_id)purchases.address_id,purchases。* FROM「purchases」WHERE「purchases」。「product_id」= 1 ORDER BY address_id DESC)AS tmp ORDER BY tmp.purchased_at DESC – aembke 2014-06-17 20:38:36

23

一個子查詢可以解決這個問題:

SELECT * 
FROM (
    SELECT DISTINCT ON (address_id) * 
    FROM purchases 
    WHERE product_id = 1 
    ) p 
ORDER BY purchased_at DESC; 

ORDER BY領先的詞句在DISTINCT ON與列同意,所以不能按訂單不同的列在相同的SELECT

SELECT * 
FROM (
    SELECT DISTINCT ON (address_id) * 
    FROM purchases 
    WHERE product_id = 1 
    ORDER BY address_id, purchased_at DESC -- get "latest" row per address_id 
    ) p 
ORDER BY purchased_at DESC; 

如果purchased_at可以NULL,考慮DESC NULLS LAST

只有在子查詢,如果你想從每組選擇一個特定的行使用附加ORDER BY
相關,與更多的解釋:

+0

如果沒有匹配的ORDER BY,你不能使用'DISTINCT ON'。第一個查詢需要在子查詢內部有一個ORDER BY address_id。 – 2017-07-12 18:46:13

+0

@AristotlePagaltzis:但你*可以*。無論你從哪裏得到,都是不正確的。你可以在同一查詢中使用'DISTINCT ON'而不用'ORDER BY'。在這種情況下,您可以從由「DISTINCT ON」子句定義的每組對等中獲取任意行。嘗試它或按照上面的鏈接瞭解詳細信息和手冊的鏈接。同一個查詢中的ORDER BY'(同樣的'SELECT')不能不同意'DISTINCT ON'。我也解釋了這一點。 – 2017-07-13 00:08:23

+0

嗯,你是對的。除非'ORDER BY'被使用「,否則我對」不可預測的「的含義一無所知,因爲它對我來說沒有任何意義,該功能被實現爲能夠處理非連續的值集合......但是贏得了'讓你可以利用明確的順序來利用它。煩人。 – 2017-07-13 06:31:43

10

窗口功能可以解決一通:

SELECT DISTINCT ON (address_id) 
    LAST_VALUE(purchases.address_id) OVER wnd AS address_id 
FROM "purchases" 
WHERE "purchases"."product_id" = 1 
WINDOW wnd AS (
    PARTITION BY address_id ORDER BY purchases.purchased_at DESC 
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) 
+3

如果有人解釋了這個查詢,那將會很好。 – Gajus 2017-04-29 10:18:24

+0

@Gajus:簡短的解釋:它不起作用,只返回不同的'address_id'。原則*可以*工作,但。相關示例:https://stackoverflow.com/a/22064571/939860或https://stackoverflow.com/a/11533808/939860。但是對於手頭的問題,有更短的和/或更快的查詢。 – 2017-07-17 15:56:04

1

對於使用燒瓶SQLAlchemy的人,這個工作對我來說

from app import db 
from app.models import Purchases 
from sqlalchemy.orm import aliased 
from sqlalchemy import desc 

stmt = Purchases.query.distinct(Purchases.address_id).subquery('purchases') 
alias = aliased(Purchases, stmt) 
distinct = db.session.query(alias) 
distinct.order_by(desc(alias.purchased_at)) 
+0

是的,甚至更容易,我可以使用:'query.distinct(foo).from_self()。order(bar)' – 2018-01-04 14:46:54

+0

@LaurentMeyer你的意思是'Purchases.query'? – reubano 2018-01-08 13:24:31

+0

是的,我的意思是Purchases.query – 2018-01-08 14:14:34

-2

您也可以通過使用GROUP BY子句這樣做

SELECT purchases.address_id, purchases.* FROM "purchases" 
    WHERE "purchases"."product_id" = 1 GROUP BY address_id, 
purchases.purchased_at ORDER purchases.purchased_at DESC 
+0

這是不正確的(除非'採購'只有'address_id'和'purchased_at'這兩列)。由於有'GROUP BY',你需要使用一個聚合函數來獲得每個不用於分組的列的值,所以它們的值都將來自組的不同行,除非你經歷了醜陋和低效的體操。這隻能通過使用窗口函數而不是「GROUP BY」來解決。 – 2017-07-12 18:10:38