快速查詢以對SQL數據執行規範化操作

我有一些數據要進行規範化。具體來說，我正在對其進行規範化，以便我可以處理正常化的部分，而不必擔心重複。我在做什麼是：快速查詢以對SQL數據執行規範化操作

INSERT INTO new_table (a, b, c) 
    SELECT DISTINCT a,b,c 
    FROM old_table; 

UPDATE old_table 
SET abc_id = new_table.id 
FROM new_table 
WHERE new_table.a = old_table.a 
    AND new_table.b = old_table.b 
    AND new_table.c = old_table.c;

首先，它似乎應該有一個更好的方式來做到這一點。似乎找到不同數據的固有過程可能會產生屬於它的成員列表。其次，更重要的是，INSERT需要一對夫婦，UPDATE需要FOREVER（實際上我沒有多長時間的價值，因爲它仍在運行）。我正在使用postgresql。有沒有更好的方法來做到這一點（也許所有在一個查詢中）。

來源

2016-06-09 CrazyCasta

如果UPDATE採用_FOREVER_，那麼是否可能是因爲缺少new_table（a，b，c）'上的唯一索引？ –

你'分析了old_table;分析new_table;'首先？ –

@EgorRogov我沒有，但現在這樣做，它只是說「分析」。然後我讀了一篇說「它收集靜態信息」的文檔，但我不確定我完全理解它應該做什麼。我是否應該從中收集一些信息，或者它只是讓查詢更快的一種神奇方式？ – CrazyCasta

這是我的對方回答，擴展到三列：

 -- Some test data 
CREATE TABLE the_table 
     (id SERIAL NOT NULL PRIMARY KEY 
     , name varchar 
     , a INTEGER 
     , b varchar 
     , c varchar 
     ); 
INSERT INTO the_table(name, a,b,c) VALUES 
('Chimpanzee' , 1, 'mammals', 'apes') 
,('Urang Utang' , 1, 'mammals', 'apes') 
,('Homo Sapiens' , 1, 'mammals', 'apes') 
,('Mouse' , 2, 'mammals', 'rodents') 
,('Rat' , 2, 'mammals', 'rodents') 
,('Cat' , 3, 'mammals', 'felix') 
,('Dog' , 3, 'mammals', 'canae') 
     ; 

     -- [empty] table to contain the "squeezed out" domain {a,b,c} 
CREATE TABLE abc_table 
     (id SERIAL NOT NULL PRIMARY KEY 
     , a INTEGER 
     , b varchar 
     , c varchar 
     , UNIQUE (a,b,c) 
     ); 

     -- The original table needs a "link" to the new table 
ALTER TABLE the_table 
     ADD column abc_id INTEGER -- NOT NULL 
     REFERENCES abc_table(id) 
     ; 
     -- FK constraints are helped a lot by a supportive index. 
CREATE INDEX abc_table_fk ON the_table (abc_id); 

     -- Chained query to: 
     -- * populate the domain table 
     -- * initialize the FK column in the original table 
WITH ins AS (
     INSERT INTO abc_table(a,b,c) 
     SELECT DISTINCT a,b,c 
     FROM the_table a 
     RETURNING * 
     ) 
UPDATE the_table ani 
SET abc_id = ins.id 
FROM ins 
WHERE ins.a = ani.a 
AND ins.b = ani.b 
AND ins.c = ani.c 
     ; 

     -- Now that we have the FK pointing to the new table, 
     -- we can drop the redundant columns. 
ALTER TABLE the_table DROP COLUMN a, DROP COLUMN b, DROP COLUMN c; 

SELECT * FROM the_table; 
SELECT * FROM abc_table; 

     -- show it to the world 
SELECT a.* 
     , c.a, c.b, c.c 
FROM the_table a 
JOIN abc_table c ON c.id = a.abc_id 
     ;

結果：

CREATE TABLE 
INSERT 0 7 
CREATE TABLE 
ALTER TABLE 
CREATE INDEX 
UPDATE 7 
ALTER TABLE 
id |  name  | abc_id 
----+--------------+-------- 
    1 | Chimpanzee |  4 
    2 | Urang Utang |  4 
    3 | Homo Sapiens |  4 
    4 | Mouse  |  3 
    5 | Rat   |  3 
    6 | Cat   |  1 
    7 | Dog   |  2 
(7 rows) 

id | a | b | c  
----+---+---------+--------- 
    1 | 3 | mammals | felix 
    2 | 3 | mammals | canae 
    3 | 2 | mammals | rodents 
    4 | 1 | mammals | apes 
(4 rows) 

id |  name  | abc_id | a | b | c  
----+--------------+--------+---+---------+--------- 
    1 | Chimpanzee |  4 | 1 | mammals | apes 
    2 | Urang Utang |  4 | 1 | mammals | apes 
    3 | Homo Sapiens |  4 | 1 | mammals | apes 
    4 | Mouse  |  3 | 2 | mammals | rodents 
    5 | Rat   |  3 | 2 | mammals | rodents 
    6 | Cat   |  1 | 3 | mammals | felix 
    7 | Dog   |  2 | 3 | mammals | canae 
(7 rows)

編輯：這似乎是工作不夠好，我討厭看到向下投我放在那裏，如此無用的編輯（CrazyCasta）。

來源

2016-06-09 19:00:12 wildplasser

根據我上面的評論：http://pastebin.com/P7wtCxYx。這似乎沒有任何更好的，然後我的原始查詢與新表上的唯一約束。 – CrazyCasta

使用主鍵，外鍵和支持索引，它是不同的。而且可能會更好。 – wildplasser

好吧，看了你一大堆後，你做了一個散列連接，如果你添加新列之前插入。我不完全確定爲什麼，但它似乎對所做事情的順序非常挑剔，即使結果是相同的。我不能只是用已經存在的外鍵創建表）這可能是我的情況的一個問題，但它確實看起來像它可能在某些情況下工作。可悲的是，SO不會讓我失望：（ – CrazyCasta

想出了一個辦法做到這一點我自己：

BEGIN; 

CREATE TEMPORARY TABLE new_table_temp (
    LIKE new_table, 
    old_ids integer[] 
) 
ON COMMIT DROP; 

INSERT INTO new_table_temp (a, b, c, old_ids) 
    SELECT a, b, c, array_ag(id) AS old_ids 
    FROM old_table 
    GROUP BY a, b, c; 

INSERT INTO new_table (id, a, b, c) 
    SELECT id, a, b, c 
    FROM new_table_temp; 

UPDATE old_table 
SET abc_id = new_table_temp.id 
FROM new_table_temp 
WHERE old_table.id = ANY(new_table_temp.old_ids); 

COMMIT;

這至少是我一直在尋找。我會更新它是否快速運行。 EXPLAIN似乎是一個明智的計劃，所以我很有希望。

來源

2016-06-09 18:24:00 CrazyCasta

在這裏看到我的答案：http://stackoverflow.com/a/29879536/905902（用你的{a，b，c}列替換{category，subcategory}。不要忘記{a，b，c }和FK/PK約束！） – wildplasser

好吧，我只是看了一眼，我在第一張桌子上放了一個獨特的索引，它會做很多工作。 http://pastebin.com/P7wtCxYx根據這個解釋，它必須拉起桌子，進行順序掃描並對鍵進行排序。我的結果是對id進行哈希查找。 – CrazyCasta

散列 - >>索引切換是由統計數據和索引的缺失決定的。沒有額外的信息，計劃者通常會選擇散列解決方案，除非散列表預計不適合內存。順便說一句：對於小測試數據，我的解決方案也會產生散列表。 – wildplasser

快速查詢以對SQL數據執行規範化操作

回答

相關問題