我們有一個帶有5個節點和一個仲裁器的Percona Xtradb集羣。我們的一位Php開發人員在羣集上運行了一個錯誤的查詢,導致所有節點崩潰。在崩潰之後,我們無法收集任何錯誤日誌,告訴我們什麼是錯誤的,因爲整個羣集崩潰而沒有執行任何日誌記錄。Percona Xtradb集羣崩潰
我一直認爲,當在羣集上執行單個查詢時,它僅由羣集中的一個節點處理。因此,如果查詢不好(直至殺死數據庫服務器),它應該只會處理正在處理它的一個節點,使羣集與其餘4個節點一起運行。
這種行爲讓我們感到困惑,我們想了解真正發生了什麼,尤其是這是第二次發生這種情況。爲什麼在其中一個節點處理的情況下在羣集上運行的查詢會在處理中遇到某些問題時導致羣集中的其他節點崩潰?
下面是我們的my.cnf配置:
#
# Default values.
[mysqld_safe]
flush_caches
numa_interleave
#
#
[mysqld]
back_log = 65535
binlog_format = ROW
character_set_server = utf8
collation_server = utf8_general_ci
datadir = /var/lib/mysql
default_storage_engine = InnoDB
expand_fast_index_creation = 1
expire_logs_days = 7
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_instances = 16
innodb_buffer_pool_populate = 1
innodb_buffer_pool_size = 32G # XXX 64GB RAM, 80%
innodb_data_file_path = ibdata1:64M;ibdata2:64M:autoextend
innodb_file_format = Barracuda
innodb_file_per_table
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_io_capacity = 1600
innodb_large_prefix
innodb_locks_unsafe_for_binlog = 1
innodb_log_file_size = 64M
innodb_print_all_deadlocks = 1
innodb_read_io_threads = 64
innodb_stats_on_metadata = FALSE
innodb_support_xa = FALSE
innodb_write_io_threads = 64
log-bin = mysqld-bin
log-queries-not-using-indexes
log-slave-updates
long_query_time = 1
max_allowed_packet = 64M
max_connect_errors = 4294967295
max_connections = 4096
min_examined_row_limit = 1000
port = 3306
relay-log-recovery = TRUE
skip-name-resolve
slow_query_log = 1
slow_query_log_timestamp_always = 1
table_open_cache = 4096
thread_cache = 1024
tmpdir = /db/tmp
transaction_isolation = REPEATABLE-READ
updatable_views_with_limit = 0
user = mysql
wait_timeout = 60
#
# Galera Variable config
wsrep_cluster_address = gcomm://ip_1, ip_2, ip_3,ip_4,ip_4,ip_5
wsrep_cluster_name = cluster_db
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_provider_options = "gcache.size=4G"
wsrep_slave_threads = 32
wsrep_sst_auth = "user:password"
wsrep_sst_donor = "db1"
#wsrep_sst_method = xtrabackup_throttle
wsrep_sst_method = xtrabackup-v2
#
# XXX You *MUST* change!
server-id = 1
binlog中的任何內容? gcache? –
是的,binlog和gcache確實有數據。 –
你能看到它們中的錯誤查詢嗎?或者你是否知道錯誤的查詢是什麼? –