所以我正在用一些本地虛擬機測試一些玩具postgresql基礎結構,以確定pgpool在故障轉移時的行爲。我配置了一個基本的設置,其中有兩臺數據庫機器(192.168.0.2和192.168.0.3)和一臺pgpool機器(192.168.0.4)。已使用流複製將192.168.0.3設置爲192.168.0.2的從屬設備。 pgpool-ii已經使用以下配置:主/從模式下的pgpool-ii:我如何最容易觸發故障切換?
listen_addresses = '*'
backend_hostname0 = '192.168.0.2'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/9.4/main/'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '192.168.0.3'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/9.4/main/'
backend_flag1 = 'ALLOW_TO_FAILOVER'
enable_pool_hba = on
replication_mode = false
master_slave_mode = on
master_slave_sub_mode = 'stream'
fail_over_on_backend_error = true
failover_command = '/root/pgpool_failover_stream.sh %d %H /tmp/postgresql.trigger.5432'
load_balance_mode = false
我已經證實了這一切的作品。也就是說,當我更改master數據庫時,複製工作正常,我可以通過示例應用程序連接到master,slave和pgpool-ii,並獲得我期望的結果。
現在,我已經開始了一個連接到pgpool的長時間運行的應用程序,然後嘗試通過SSH進入主數據庫服務器並強制結束postgres任務(以root用戶身份登錄service postgresql stop
)進行故障轉移。我的應用程序保持正確執行查詢,但不發生故障轉移(腳本尚未運行)。我甚至測試過直接連接到master數據庫,當我停止postgres服務時,我最終崩潰了應用程序。
我做錯了什麼?我沒有正確配置我的pgpool嗎?還是有更好的方法來觸發故障轉移?
編輯:按照要求,這裏是哪裏出現的第一個錯誤日誌的部分:
...
2016-03-15 18:47:15: pid 1232: DEBUG: initializing backend status
2016-03-15 18:47:15: pid 1231: DEBUG: initializing backend status
2016-03-15 18:47:15: pid 1230: DEBUG: initializing backend status
2016-03-15 18:47:15: pid 1209: ERROR: failed to authenticate
2016-03-15 18:47:15: pid 1209: DETAIL: invalid authentication message response type, Expecting 'R' and received 'E'
2016-03-15 18:47:15: pid 1209: LOG: find_primary_node: checking backend no 1
2016-03-15 18:47:15: pid 1209: ERROR: failed to authenticate
2016-03-15 18:47:15: pid 1209: DETAIL: invalid authentication message response type, Expecting 'R' and received 'E'
2016-03-15 18:47:15: pid 1209: DEBUG: find_primary_node: no primary node found
...
奇怪的是,我仍然可以連接到pgpool和執行查詢,所以我顯然不明白的東西那裏。
編輯2:這些是我在主人的service postgresql shutdown
後得到的錯誤。我顯示了一切,開始關閉pgpool。
...
2016-03-16 17:24:57: pid 1012: DEBUG: session context: clearing doing extended query messaging. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: session context: setting doing extended query messaging. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: session context: setting query in progress. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: reading backend data packet kind
2016-03-16 17:24:57: pid 1012: DETAIL: backend:0 of 2 kind = 'E'
2016-03-16 17:24:57: pid 1012: DEBUG: processing backend response
2016-03-16 17:24:57: pid 1012: DETAIL: received kind 'E'(45) from backend
2016-03-16 17:24:57: pid 1012: ERROR: unable to forward message to frontend
2016-03-16 17:24:57: pid 1012: DETAIL: FATAL error occured on backend
2016-03-16 17:24:57: pid 1012: DEBUG: session context: setting query in progress. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: decide where to send the queries
2016-03-16 17:24:57: pid 1012: DETAIL: destination = 3 for query= "DISCARD ALL"
2016-03-16 17:24:57: pid 1012: DEBUG: waiting for query response
2016-03-16 17:24:57: pid 1012: DETAIL: waiting for backend:0 to complete the query
2016-03-16 17:24:57: pid 1012: FATAL: unable to read data from DB node 0
2016-03-16 17:24:57: pid 1012: DETAIL: EOF encountered with backend
2016-03-16 17:24:57: pid 998: DEBUG: reaper handler
2016-03-16 17:24:57: pid 998: LOG: child process with pid: 1012 exits with status 256
2016-03-16 17:24:57: pid 998: LOG: fork a new child process with pid: 1033
2016-03-16 17:24:57: pid 998: DEBUG: reaper handler: exiting normally
2016-03-16 17:24:57: pid 1033: DEBUG: initializing backend status
2016-03-16 17:25:02: pid 1031: DEBUG: PCP child receives shutdown request signal 2
2016-03-16 17:25:02: pid 1029: LOG: child process received shutdown request signal 2
...
請注意,我的示例應用程序事實上在主站關閉時死亡。
編輯3:錯誤我得到在新的日誌,經過合理設置sr_check_period
,sr_check_user
,sr_check_password
,所有先前的錯誤,現在都沒有了:
2016-03-31 17:45:00: pid 18363: DEBUG: detect error: kind: 1
2016-03-31 17:45:00: pid 18363: DEBUG: reading backend data packet kind
2016-03-31 17:45:00: pid 18363: DETAIL: backend:0 of 2 kind = '1'
...
2016-03-31 17:45:00: pid 18363: DEBUG: detect error: kind: S
嗨Raveesh,謝謝你的回覆!我已啓用日誌記錄,甚至在啓動時我已經注意到一些錯誤似乎可能是相關的。我編輯了我的問題以包含必要的信息。 – gdoug
您可以給出關閉主設備後發生的日誌。我認爲這些日誌沒有指出「爲什麼故障轉移不執行腳本」的真正問題 –
再次請求更新日誌信息。 – gdoug