從多列分組的行中選擇最大值的行（PSQL）

我有一個交易數據表，它是對未來的預測。因此，由相同的日期，類型，位置和產品確定的相同預測因此被多次讀取，因爲預測隨着時間的推移變得更準確並且被重新發送。從多列分組的行中選擇最大值的行（PSQL）

我想創建一個查詢，將相同類型和相同位置，產品和日期的事務分組，然後從這些組中僅選擇具有最新更新時間戳的組。

該表具有現在的行數十萬和隨着時間的推移，數以百萬計，所以相當有效的解決方案，將不勝感激:)

實施例的表：

date | location_code | product_code | quantity | type | updated_at 
------------+------------------+---------------+----------+----------+------------ 
2013-02-04 | ABC | 123 | -26.421 | TRANSFER | 2013-01-12 
2013-02-07 | ABC | 123 | -48.1 | SALE | 2013-01-10 
2013-02-06 | BCD | 234 | -58.107 | SALE | 2013-01-11 
2013-02-06 | BCD | 234 |  -60 | SALE | 2013-01-10 
2013-02-04 | ABC | 123 | -6.727 | TRANSFER | 2013-01-10

期望的結果：

date | location_code | product_code | quantity | type | updated_at 
------------+------------------+---------------+----------+----------+------------ 
2013-02-04 | ABC | 123 | -26.421 | TRANSFER | 2013-01-12 
2013-02-07 | ABC | 123 | -48.1 | SALE | 2013-01-10 
2013-02-06 | BCD | 234 | -58.107 | SALE | 2013-01-11

我試過例如：

SELECT t.date, t.location_code, t.product_code, t.quantity, t.type, t.updated_at 
FROM transactions t 
INNER JOIN 
(
    SELECT MAX(updated_at) as max_updated_at 
    FROM transactions 
    GROUP BY product_code, location_code, type, date 
) s on t.updated_at=max_updated_at;

但這似乎需要很長時間，似乎並不奏效。

謝謝你的幫助！

來源

2013-03-16 jesseniem

你是在正確的軌道上加入更有效。只需在子查詢中添加更多字段並加入其中。 – 2013-03-16 22:33:37

select distinct on ("date", location_code, product_code, type) 
    "date", 
    location_code, 
    product_code, 
    quantity, 
    type, 
    updated_at 
from transactions t 
order by t."date", t.location_code, t.product_code, t.type, t.updated_at desc

來源

2013-03-16 22:45:34

嘗試了這一個，但得到了以下錯誤：'錯誤：SELECT DISTINCT ON表達式必須匹配初始ORDER BY表達式 LINE 1：選擇不同的on（date，location_code，product_code，type）' – jesseniem 2013-03-16 22:49:36

@jesuli更正 – 2013-03-16 22:51:17

謝謝！測試這一個，它似乎是目前爲止最有效的解決方案：'$ cat time3 Sun Mar 17 01:06:50 EET 2013 Sun Mar 17 01:06:53 EET 2013 | Sun Mar 17 01:06:54 EET 2013 Sun Mar 17 01:06:57 EET 2013 | 太陽3月17日01:06:58 EET 2013 Sun Mar 17 01:07:02 EET 2013' – jesseniem 2013-03-16 23:13:13

謝謝Dan Bracuk！

這是正確的查詢：

SELECT t.date, t.location_code, t.product_code, t.quantity, t.type, t.updated_at 
FROM transactions t 
INNER JOIN 
(
    SELECT MAX(updated_at) as max_updated_at, product_code prod, location_code loc, type  typ, date dat 
    FROM transactions 
    GROUP BY product_code, location_code, type, date 
    ) s ON t.updated_at=max_updated_at AND t.location_code=loc AND t.product_code=prod AND t.type=typ AND t.date=dat;

來源

2013-03-16 22:43:54 jesseniem

使用窗口函數可能更有效。在這種情況下，Clodoaldo Neto的「獨特的」解決方案可能是最有效的解決方案。 – 2013-03-16 22:50:13

這可能是比與派生表

select * 
from (
    select date, 
      location_code, 
      product_code, 
      quantity, 
      type, 
      updated_at, 
      max(updated_at) over (partition by product_code, location_code, type, date) as max_updated 
    from transactions 
) t 
where updated_at = max_updated;

來源

2013-03-16 22:49:30

冉快速非常不科學的性能測試，顯示可以忽略不計的性能差異。派生表的方法是time2，這個版本time1：'$ cat time1 Sun 3月17日00:57:09 EET 2013 Sun 3月17日00:57:13 EET 2013 | Sun 3月17日00:57:15 EET 2013 Sun 3月17日00:57:20 EET 2013 | 太陽3月17日零時57分23秒EET 2013 太陽3月17日零時57分29秒EET 2013 $貓時間2 太陽3月17日零時55分45秒EET 2013 太陽3月17日0時55分49秒EET 2013 | Sun 3月17日00:56:06 EET 2013 Sun 3月17日00:56:11 EET 2013 | Sun Mar 17 00:56:14 EET 2013 Sun 3月17日00:56:18 EET 2013 ' – jesseniem 2013-03-16 23:04:42

從多列分組的行中選擇最大值的行（PSQL）

回答

相關問題