2016-03-02 44 views
0

我正在運行帶有PostgreSQL 9.1(1個主節點,3個從節點)的流式複製環境。一切正常工作aprox。 2個月。昨日,複製到從服務器的一個失敗,日誌上具有奴:錯誤的資源管理器數據校驗和記錄在2/XYZ +由於管理員命令而終止walreceiver進程

LOG: incorrect resource manager data checksum in record at 61/DA2710A7 
FATAL: terminating walreceiver process due to administrator command 
LOG: incorrect resource manager data checksum in record at 61/DA2710A7 
LOG: incorrect resource manager data checksum in record at 61/DA2710A7 
LOG: incorrect resource manager data checksum in record at 61/DA2710A7 
LOG: incorrect resource manager data checksum in record at 61/DA2710A7 
LOG: incorrect resource manager data checksum in record at 61/DA2710A7 
LOG: incorrect resource manager data checksum in record at 61/DA2710A7 
LOG: incorrect resource manager data checksum in record at 61/DA2710A7 

奴隸不再與主機同步。 兩個小時後,其中日誌變得像每5秒以上的新線,我重新啓動從數據庫服務器:

LOG: incorrect resource manager data checksum in record at 61/DA2710A7 
LOG: received fast shutdown request 
LOG: aborting any active transactions 
LOG: incorrect resource manager data checksum in record at 61/DA2710A7 
FATAL: terminating connection due to administrator command 
FATAL: terminating connection due to administrator command 
LOG: shutting down 
LOG: database system is shut down 

從屬節點上的新日誌文件包含:

LOG: database system was shut down in recovery at 2016-02-29 05:12:11 CET 
LOG: entering standby mode 
LOG: redo starts at 61/D92C10C9 
LOG: consistent recovery state reached at 61/DA2710A7 
LOG: database system is ready to accept read only connections 
LOG: incorrect resource manager data checksum in record at 61/DA2710A7 
LOG: streaming replication successfully connected to primary 

現在,從設備與主設備同步,但校驗和條目仍然存在。我檢查的另一件事是網絡日誌 - >網絡可用。

我的問題是:

  1. 有誰知道爲什麼walreceiver被終止?
  2. PostgreSQL爲什麼不重試複製?
  3. 我能做些什麼來預防這種情況?

謝謝。

編輯:

數據庫服務器與EXT3在SLES 11運行。我發現一篇關於SLES 11低性能的文章,但是我不確定它是否適用,因爲我的機器只有8 GB RAM(https://www.novell.com/support/kb/doc.php?id=7010287

任何幫助,將不勝感激。

EDIT(2):

PostgreSQL的版本是9.1.5。似乎PostgreSQL版本9.1.6提供了類似問題的修復?

Fix persistence marking of shared buffers during WAL replay (Jeff Davis) 

This mistake can result in buffers not being written out during checkpoints, resulting in data corruption if the server later crashes without ever having written those buffers. Corruption can occur on any server following crash recovery, but it is significantly more likely to occur on standby slave servers since those perform much more WAL replay. 

來源:http://www.postgresql.org/docs/9.1/static/release-9-1-6.html

也許這是修復?我應該升級到PostgreSQL 9.1.6,一切都會平穩運行嗎?

回答

0

如果有人絆倒了這個問題,我結束了從備份數據重新安裝數據庫並重新設置複製。從來沒有真正明白出了什麼問題。

0

從來沒有真正知道出了什麼問題。

我遇到了同樣的錯誤 - 只是它從不從一開始就完全同步。

然後,主服務器有一些內核錯誤(服務器的情況下發熱問題?)。由於未完全關閉,服務器需要關閉。已經停機時,從顯示了

LOG: incorrect resource manager data checksum in record at 1/63663CB0 

主服務器和從服務器的重啓重啓後,情況並沒有改變:每5秒相同的日誌條目。

相關問題