2015-03-08 45 views
0

會發生什麼情況? 當我開始keepalived所有工作正常。當node01失敗並且無法啓動postgresql時,它會不斷嘗試強制進行選舉。即使postgresql無法啓動。選舉現在每秒鐘都在發生。keepalived腳本使故障轉移瘋狂

我想要實現 應該檢查是否PostgreSQL對NODE01啓動時NODE02是主人,但不強制進行選舉,所有的時間是什麼。有人可以嘗試幫助並正確理解它嗎?

這是我的代碼

停止pgsql的:

#!/usr/bin/python 

import sys 
import subprocess 

sys.exit(
    subprocess.call(['/usr/bin/systemctl', 'stop', 'postgresql.service']) 
) 

通知:

#!/usr/bin/python 

import sys 
import subprocess 

state = sys.argv[3] 

with open('/var/run/keepalived.pgsql.state', 'w+') as f: 
    f.write(state) 

if state == 'MASTER': 
    sys.exit(
     subprocess.call(['/usr/bin/systemctl', 'start', 'postgresql.service']) 
    ) 

if state == 'BACKUP': 
    sys.exit(
     subprocess.call(['/usr/bin/systemctl', 'stop', 'postgresql.service']) 
    ) 

if state == 'FAULT': 
    sys.exit(
     subprocess.call(['/usr/bin/systemctl', 'stop', 'postgresql.service']) 
    ) 

檢查的pgsql:

#!/usr/bin/python 

import sys 
import subprocess 
from time import sleep 

sleep(1) 

with open('/var/run/keepalived.pgsql.state', 'r') as f: 
    state = f.read().strip().strip("\n") 

# status 0: Postgresql is running 
# status 3: Postgresql has been stopped 
status = subprocess.call(['/usr/bin/systemctl', 'status', 'postgresql.service']) 

if status == 0 and state == 'MASTER': 
    sys.exit(0) 

if status == 0 and state == 'BACKUP': 
    sys.exit(3) 

if status == 3 and state == 'MASTER': 
    sys.exit(3) 

if status == 3 and state == 'BACKUP': 
    sys.exit(0) 

keepalived配置:

vrrp_script chk_pgsql { 
    script  "/etc/keepalived/check-pgsql" 
    interval 1 
    fall 3 
    rise 3 
    weight -4 
} 

vrrp_instance pgsql_vip { 
    state EQUAL 
    interface eth0 
    virtual_router_id 4 
    priority 100(node01)|99{node02} 
    advert_int 1 
    authentication { 
     auth_type PASS 
     auth_pass 1111 
    } 
    track_script { 
     chk_pgsql 
    } 
    virtual_ipaddress { 
     192.168.1.20 
    } 
    notify "/etc/keepalived/notify" 
    notify_stop "/etc/keepalived/stop" 
} 

回答

0

node01死後,node02被選爲主。然後,你的檢查腳本檢查node01。腳本看到node01現在處於BACKUP狀態,posgresql停止,並返回0.在檢查腳本返回0 3次(根據您的VRRP配置)後,node01認爲它是正確的。然後,由於node01具有比node02更高的優先級,因此它通過選舉過程進行控制。然後檢查腳本失敗,因爲node01處於MASTER狀態並且posgresql停止。這導致keepalived在節點之間開始振盪。

我想你可以在2的一個方法解決這個問題:

  1. 化妝NODE01和NODE02同等優先
  2. 改變你的check腳本來只返回posgresql的狀態