2017-06-16 69 views
1

不同我有兩列的數據庫:的Postgres選擇複製上一列,但在另一個

author_id, message

和條目,如:

123, "message!" 
123, "message!" 
123, "different message" 
124, "message!" 

我想要做一個查詢,讓我選擇:

123, "message!"

124, "message!"

實質上,條目其中message是相同的,但author_id是不同的。

然後我想要刪除其中的一個條目。 (哪一個並不重要,只是我可以只選擇其中的一個)。

This question讓我接近,但它是爲了重複兩列。

+1

什麼是不同的作者有多個常見消息(爲前兩者 'author_id' 123和124有 「消息2」) ?那麼什麼是可取的結果? –

+0

@OtoShavadze同樣,只需選擇其中一個。如果同一作者有兩個副本,第二個作者有三個作品中的任何一個。 – goddamnyouryan

+1

此表是否有主鍵? - 如果解決方案恰好選擇要刪除的'123'消息!'行,是否應該刪除所有這些行? – pozs

回答

3

還有一可選示例:

-- Test table 
CREATE TABLE dummy_data (
    author_id int, 
    message  text 
); 

-- Test data 
INSERT INTO dummy_data (author_id, message) 
VALUES 
(123, '"message!"'), 
(123, '"message!"'), 
(123, '"different message"'), 
(124, '"message!"'), 
(124, '"message!"'), 
(125, '"message!"'); 

-- Delete query 
DELETE FROM dummy_data 
WHERE ctid NOT IN (
      SELECT max(ctid) 
      FROM dummy_data 
      GROUP BY message  -- this is important to specify 
     ) 
-- just for test returning deleted records, 
-- you may ignore it, if don't want 
RETURNING *; 

-- Confirming result: 
SELECT * FROM dummy_data ; 
author_id |  message 
-----------+--------------------- 
     123 | "different message" 
     125 | "message!" 
(2 rows) 

查看更多有關係統列:https://www.postgresql.org/docs/current/static/ddl-system-columns.html

編輯:
附加例如作爲請求限制由ID的範圍(AUTHOR_ID)。

純查詢:

DELETE FROM dummy_data 
USING (SELECT ARRAY[ 123, 124]) v(id) 
WHERE author_id = ANY (v.id) 
AND  ctid NOT IN (
      SELECT max(ctid) 
      FROM dummy_data 
      WHERE author_id = ANY (v.id) 
      GROUP BY message 
     ); 

與意見相同的查詢:

DELETE FROM dummy_data 
-- Add your 'author_id' values into array here. 
-- Reason we list it here with USING statement is 
-- because we need to compare values in two places 
-- and if list is too big it would be annoyance to 
-- write it 2 times :) 
USING (SELECT ARRAY[ 123, 124]) v(id) 
-- First we get all the authors in the batch by ID 
WHERE author_id = ANY (v.id) 
-- Secondly we get max CTID to ignore using same 
-- authors range in batch scope 
AND  ctid NOT IN (
      SELECT max(ctid) 
      FROM dummy_data 
      WHERE author_id = ANY (v.id) 
      GROUP BY message 
     ); 

-- This will delete following rows: 
author_id | message 
-----------+------------ 
     123 | "message!" 
     123 | "message!" 
     124 | "message!" 
(3 rows) 

-- Leaving the state to table: 
author_id |  message 
-----------+--------------------- 
     123 | "different message" 
     124 | "message!" 
     125 | "message!" 
(3 rows) 
+0

這很好,但也有點慢。我在我的數據庫中有大約1億行,我正在這樣做,所以能夠將其範圍也很好,例如刪除重複項,但僅限於特定的子集,比如說在數組中author_ids'[123,124]'。你怎麼能修改這個查詢來處理? – goddamnyouryan

+1

但是,如果你通過'author_ids'來完成,那麼以'[123,124]'爲例,124的值將會保留。但是,如果你餵養'[125,126]',那麼這是一個新的外觀,它不知道最後一批「批次」。即使''message!「'重複爲125,這意味着'124'消息!」'仍然存在。這對你有好處嗎?如果是,我可以很容易地編輯示例:) –

+0

是的,沒關係。基本上,我有一組作者分組在一起,我想確保這些組內沒有重複的消息。那有意義嗎? – goddamnyouryan

1

您可以使用array_agg()對於這一點,如:

select author_id, message 
from (
    select message, array_agg(distinct author_id) ids 
    from my_table 
    group by message 
    ) s 
cross join unnest(ids) author_id 
where cardinality(ids) > 1 
order by author_id; 

author_id | message 
-----------+---------- 
     123 | message! 
     124 | message! 
(2 rows) 

如果你想獲得倍增的消息的單行,查詢可能是簡單的:

select min(author_id) as author_id, message 
from my_table 
group by message 
having count(distinct author_id) > 1; 

author_id | message 
-----------+---------- 
     123 | message! 
(1 row) 
+0

對於第二種選擇,我非常喜歡它,它非常簡單。是否有可能選擇'id'列?如果我將它添加到select中,我也必須對它進行分組,然後查詢不再正常工作。 – goddamnyouryan

1

如果我理解正確,你需要這樣的東西:

with the_table (author_id, message) as (
    select 123, '"message!"' union all 
    select 123, '"message!"' union all 
    select 123, '"aaa!"' union all 
    select 123, '"different message"' union all 
    select 124, '"aaa!"' union all 
    select 124, '"message!"' union all 
    select 125, '"aaa!"' union all 
    select 125, '"rrrr!"' 
) 


select the_table.* from the_table 
join ( 
    select message from the_table 
    group by message 
    having count(distinct author_id) = (select count(distinct author_id) from the_table) 
) t 
on the_table.message = t.message 
order by random() limit 1 

隨機獲取一個用戶的消息,這是c ommon所有author_id

相關問題