我正在運行帶有PostgreSQL 9.1(1個主節點,3個從節點)的流式複製環境。一切正常工作aprox。 2個月。昨日,複製到從服務器的一個失敗,日誌上具有奴:錯誤的資源管理器數據校驗和記錄在2/XYZ +由於管理員命令而終止walreceiver進程
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
FATAL: terminating walreceiver process due to administrator command
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
奴隸不再與主機同步。 兩個小時後,其中日誌變得像每5秒以上的新線,我重新啓動從數據庫服務器:
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
FATAL: terminating connection due to administrator command
FATAL: terminating connection due to administrator command
LOG: shutting down
LOG: database system is shut down
從屬節點上的新日誌文件包含:
LOG: database system was shut down in recovery at 2016-02-29 05:12:11 CET
LOG: entering standby mode
LOG: redo starts at 61/D92C10C9
LOG: consistent recovery state reached at 61/DA2710A7
LOG: database system is ready to accept read only connections
LOG: incorrect resource manager data checksum in record at 61/DA2710A7
LOG: streaming replication successfully connected to primary
現在,從設備與主設備同步,但校驗和條目仍然存在。我檢查的另一件事是網絡日誌 - >網絡可用。
我的問題是:
- 有誰知道爲什麼walreceiver被終止?
- PostgreSQL爲什麼不重試複製?
- 我能做些什麼來預防這種情況?
謝謝。
編輯:
數據庫服務器與EXT3在SLES 11運行。我發現一篇關於SLES 11低性能的文章,但是我不確定它是否適用,因爲我的機器只有8 GB RAM(https://www.novell.com/support/kb/doc.php?id=7010287)
任何幫助,將不勝感激。
EDIT(2):
PostgreSQL的版本是9.1.5。似乎PostgreSQL版本9.1.6提供了類似問題的修復?
Fix persistence marking of shared buffers during WAL replay (Jeff Davis)
This mistake can result in buffers not being written out during checkpoints, resulting in data corruption if the server later crashes without ever having written those buffers. Corruption can occur on any server following crash recovery, but it is significantly more likely to occur on standby slave servers since those perform much more WAL replay.
來源:http://www.postgresql.org/docs/9.1/static/release-9-1-6.html
也許這是修復?我應該升級到PostgreSQL 9.1.6,一切都會平穩運行嗎?