我正在使用postgres處理數據庫項目。我有一個大表,其中包含從csv文件導入的數據,我需要將其轉移到代表我設計的數據庫的其他較小表中。在表之間傳輸數據違反主鍵約束
與導入數據的大表被稱爲data_minerva
,我要傳輸部分數據的表被稱爲related_articles
。下面是DDL部分代碼:
CREATE SEQUENCE article_id_seq;
CREATE TABLE article (
article_id integer UNIQUE NOT NULL DEFAULT nextval('article_id_seq'),
title varchar,
body varchar,
publish_time timestamp,
creation_time timestamp,
id integer,
PRIMARY KEY (article_id),
FOREIGN KEY (id) REFERENCES team (id)
);
ALTER SEQUENCE article_id_seq OWNED BY article.article_id;
CREATE TABLE related_articles (
article_id1 integer NOT NULL,
article_id2 integer NOT NULL,
kind varchar,
PRIMARY KEY (article_id1, article_id2, kind),
FOREIGN KEY (article_id1) REFERENCES article (article_id),
FOREIGN KEY (article_id2) REFERENCES article (article_id)
);
正如你可以在上面的代碼片斷看到一篇文章被定義由它的ID。 data_minerva
表不包含ID列。現在,當我想要將數據從data_minerva
傳輸到related_articles
時,我遇到了data_minerva
表中存在重複的問題,它們違反了表related_articles
表的主鍵約束。不過,我試圖創建一個規則來忽略這些重複,但沒有成功。我想我需要用SELECT DISTINCT
做更多的事情,但我無法弄清楚。我用來傳輸數據的查詢:
CREATE RULE "ignore" AS ON INSERT TO related_articles
WHERE EXISTS (SELECT 1 FROM related_articles WHERE article_id1=NEW.article_id1 AND article_id2=NEW.article_id2 AND kind=NEW.kind)
DO INSTEAD NOTHING;
INSERT INTO related_articles (article_id1, article_id2, kind)
SELECT DISTINCT ON (data_minerva.articletitle, data_minerva.articlestarttime, data_minerva.writeremail,article.id, article.id, data_minerva.linkedarticletitle, data_minerva.linkedarticlestarttime)
(SELECT article_id FROM article WHERE data_minerva.linkedarticletitle IS NOT NULL AND article.title=data_minerva.articletitle AND article.creation_time=data_minerva.articlestarttime::timestamp),
(SELECT article_id FROM article WHERE article.title=data_minerva.linkedarticletitle AND article.creation_time=data_minerva.linkedarticlestarttime::timestamp),
linkedtype FROM data_minerva, article WHERE data_minerva.linkedarticletitle IS NOT NULL;
搜索「插入是否存在」 – 2014-12-19 12:29:59