2013-12-10 101 views
3

使用postgres 9.3,我有一個名爲regression_runs的表,它存儲了一些計數器。當更新,插入或刪除此表中的某行時,會調用觸發器函數來更新nightly_runs表中的行,以便爲具有給定ID的所有regression_runs保持這些計數器的運行總數。我採取的方法是相當廣泛的文件。但是,我的問題是,當多個進程嘗試同時在具有相同nightly_run_id的regression_runs表中插入新行時,我遇到了死鎖。postgresql觸發函數中的死鎖

的regression_runs表看起來像這樣:

regression=> \d regression_runs 
             Table "public.regression_runs" 
    Column  |   Type   |       Modifiers       
-----------------+--------------------------+-------------------------------------------------------------- 
id    | integer     | not null default nextval('regression_runs_id_seq'::regclass) 
username  | character varying(16) | not null 
nightly_run_id | integer     | 
nightly_run_pid | integer     | 
passes   | integer     | not null default 0 
failures  | integer     | not null default 0 
errors   | integer     | not null default 0 
skips   | integer     | not null default 0 
Indexes: 
    "regression_runs_pkey" PRIMARY KEY, btree (id) 
    "regression_runs_nightly_run_id_idx" btree (nightly_run_id) 
Foreign-key constraints: 
    "regression_runs_nightly_run_id_fkey" FOREIGN KEY (nightly_run_id) REFERENCES nightly_runs(id) ON UPDATE CASCADE ON DELETE CASCADE 
Triggers: 
    regression_run_update_trigger AFTER INSERT OR DELETE OR UPDATE ON regression_runs FOR EACH ROW EXECUTE PROCEDURE regression_run_update() 

的nightly_runs表看起來像這樣:

regression=> \d nightly_runs 
            Table "public.nightly_runs" 
    Column |   Type   |       Modifiers       
------------+--------------------------+----------------------------------------------------------- 
id   | integer     | not null default nextval('nightly_runs_id_seq'::regclass) 
passes  | integer     | not null default 0 
failures | integer     | not null default 0 
errors  | integer     | not null default 0 
skips  | integer     | not null default 0 
Indexes: 
    "nightly_runs_pkey" PRIMARY KEY, btree (id) 
Referenced by: 
    TABLE "regression_runs" CONSTRAINT "regression_runs_nightly_run_id_fkey" FOREIGN KEY (nightly_run_id) REFERENCES nightly_runs(id) ON UPDATE CASCADE ON DELETE CASCADE 

的觸發功能regression_run_update是這樣的:

CREATE OR REPLACE FUNCTION regression_run_update() RETURNS "trigger" 
    AS $$ 
     BEGIN 
     IF TG_OP = 'UPDATE' THEN 
       IF (NEW.nightly_run_id IS NOT NULL) and (NEW.nightly_run_id = OLD.nightly_run_id) THEN 
         UPDATE nightly_runs SET passes = passes + (NEW.passes - OLD.passes), failures = failures + (NEW.failures - OLD.failures), errors = errors + (NEW.errors - OLD.errors), skips = skips + (NEW.skips - OLD.skips) WHERE id = NEW.nightly_run_id; 
       ELSE 
         IF NEW.nightly_run_id IS NOT NULL THEN 
           UPDATE nightly_runs SET passes = passes + NEW.passes, failures = failures + NEW.failures, errors = errors + NEW.errors, skips = skips + NEW.skips WHERE id = NEW.nightly_run_id; 
         END IF; 
         IF OLD.nightly_run_id IS NOT NULL THEN 
           UPDATE nightly_runs SET passes = passes - OLD.passes, failures = failures - OLD.failures, errors = errors - OLD.errors, skips = skips - OLD.skips WHERE id = OLD.nightly_run_id; 
         END IF; 
       END IF; 
     ELSIF TG_OP = 'INSERT' THEN 
       IF NEW.nightly_run_id IS NOT NULL THEN 
         UPDATE nightly_runs SET passes = passes + NEW.passes, failures = failures + NEW.failures, errors = errors + NEW.errors, skips = skips + NEW.skips WHERE id = NEW.nightly_run_id; 
       END IF; 
     ELSIF TG_OP = 'DELETE' THEN 
       IF OLD.nightly_run_id IS NOT NULL THEN 
         UPDATE nightly_runs SET passes = passes - OLD.passes, failures = failures - OLD.failures, errors = errors - OLD.errors, skips = skips - OLD.skips WHERE id = OLD.nightly_run_id; 
       END IF; 
     END IF; 
     RETURN NEW; 
     END; 
$$ 
    LANGUAGE plpgsql; 

我在看什麼postgres日誌文件是這樣的:

ERROR: deadlock detected 
DETAIL: Process 20266 waits for ShareLock on transaction 7520; blocked by process 20263. 
     Process 20263 waits for ExclusiveLock on tuple (1,70) of relation 18469 of database 18354; blocked by process 20266. 
     Process 20266: insert into regression_runs (username, nightly_run_id, nightly_run_pid) values ('tbeadle', 135, 20262); 
     Process 20263: insert into regression_runs (username, nightly_run_id, nightly_run_pid) values ('tbeadle', 135, 20260); 
HINT: See server log for query details. 
CONTEXT: SQL statement "UPDATE nightly_runs SET passes = passes + NEW.passes, failures = failures + NEW.failures, errors = errors + NEW.errors, skips = skips + NEW.skips WHERE id = NEW.nightly_run_id" 
     PL/pgSQL function regression_run_update() line 16 at SQL statement 
STATEMENT: insert into regression_runs (username, nightly_run_id, nightly_run_pid) values ('tbeadle', 135, 20262); 

我可以用這個腳本重現該問題:

#!/usr/bin/env python 

import os 
import multiprocessing 
import psycopg2 

class Foo(object): 
    def child(self): 
     pid = os.getpid() 
     conn = psycopg2.connect(
      'dbname=regression host=localhost user=regression') 
     cur = conn.cursor() 
     for i in xrange(100): 
      cur.execute(
       "insert into regression_runs " 
       "(username, nightly_run_id, nightly_run_pid) " 
       "values " 
       "('tbeadle', %s, %s);", (self.nid, pid)) 
      conn.commit() 
     return 

    def start(self): 
     conn = psycopg2.connect(
      'dbname=regression host=localhost user=regression') 
     cur = conn.cursor() 
     cur.execute('insert into nightly_runs default values returning id;') 
     row = cur.fetchone() 
     conn.commit() 
     self.nid = row[0] 
     procs = [] 
     for child in xrange(5): 
      procs.append(multiprocessing.Process(target=self.child)) 
     for proc in procs: 
      proc.start() 
     for proc in procs: 
      proc.join() 

Foo().start() 

我想不通爲什麼僵局正在發生或什麼我可以做些什麼。請幫忙!

+1

恕我直言,更新觸發器內的字段是一個壞主意。由於觸發器拉手經常試圖寫入一行,並且它變成死鎖。 Mb需要架構更改。對於困難的情況,我創建緩衝隊列表並通過存儲過程分派它。當然,使用隊列調節的外部工具。 – corvinusz

+0

@corvinusz:廢話。觸發器是OP正在做的理想工具。他只是不知道幾個陷阱。 –

回答

2

通常會發生死鎖,因爲與OLD和NEW相關的更新沒有以一致的順序執行。例證:

IF TG_OP = 'UPDATE' THEN 
    IF (NEW.nightly_run_id IS NOT NULL) AND (NEW.nightly_run_id = OLD.nightly_run_id) THEN 
    -- stuff that seems fine 
    ELSE 
    IF NEW.nightly_run_id IS NOT NULL THEN 
     UPDATE nightly_runs ... WHERE id = NEW.nightly_run_id; -- lock 
    END IF; 
    IF OLD.nightly_run_id IS NOT NULL THEN 
     UPDATE nightly_runs ... WHERE id = OLD.nightly_run_id; -- lock 
    END IF; 

想象兩個事務:

  • T1獲取關於new.nightly_run_id = 1的鎖,並等待對old.nightly_run_id = 2
  • T2鎖定獲取關於新的鎖.nightly_run_id = 2,並等待對old.nightly_run_id = 1

死鎖鎖...

強制爲了避免是一種情況:

IF OLD.nightly_run_id = NEW.nightly_run_id THEN 
    -- stuff that seems fine 
ELSIF OLD.nightly_run_id < NEW.nightly_run_id THEN 
    UPDATE nightly_runs ... WHERE id = OLD.nightly_run_id; 
    UPDATE nightly_runs ... WHERE id = NEW.nightly_run_id; 
ELSEIF NEW.nightly_run_id < OLD.nightly_run_id THEN 
    UPDATE nightly_runs ... WHERE id = NEW.nightly_run_id; 
    UPDATE nightly_runs ... WHERE id = OLD.nightly_run_id; 
ELSEIF OLD.nightly_run_id IS NOT NULL THEN 
    UPDATE nightly_runs ... WHERE id = OLD.nightly_run_id; 
ELSEIF NEW.nightly_run_id IS NOT NULL THEN 
    UPDATE nightly_runs ... WHERE id = NEW.nightly_run_id; 
END IF; 

同樣的變化應該對你的另一觸發時才適用。禁止代碼中的其他病態,死鎖應該消失。