2013-12-09 76 views
0

我看到在3節點Katta集羣上部署失敗。當我通過命令行觸發katta部署時,它會嘗試部署索引3次,並失敗,並顯示「未能在374635 ms內部署索引'katta_index_1'」消息。Katta索引部署問題

部署相關日誌如下:

13/12/09 03:48:19 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master.domain.com,node1.domain.com,node2.domain.com sessionTimeout=30000 [email protected] 
13/12/09 03:48:19 INFO zookeeper.ClientCnxn: Opening socket connection to server node1.domain.com/<node1-ip>:2181 
13/12/09 03:48:19 INFO zookeeper.ClientCnxn: Socket connection established to node1.domain.com/<node1-ip>:2181, initiating session 
13/12/09 03:48:19 INFO zookeeper.ClientCnxn: Session establishment complete on server node1.domain.com/<node1-ip>:2181, sessionid = 0x242d67304660002, negotiated timeout = 30000 
13/12/09 03:48:19 INFO zkclient.ZkClient: zookeeper state changed (SyncConnected) 
...................................13/12/09 03:49:30 INFO zookeeper.ClientCnxn: Client session timed out, have not heard from server in 20000ms for sessionid 0x242d67304660002, closing socket connection and attempting reconnect 
13/12/09 03:49:30 INFO zkclient.ZkClient: zookeeper state changed (Disconnected) 
13/12/09 03:49:31 INFO zookeeper.ClientCnxn: Opening socket connection to server node2.domain.com/<node2-ip>:2181 
13/12/09 03:49:31 INFO zookeeper.ClientCnxn: Socket connection established to node2.domain.com/<node2-ip>:2181, initiating session 
13/12/09 03:49:31 INFO zookeeper.ClientCnxn: Session establishment complete on server node2.domain.com/<node2-ip>:2181, sessionid = 0x242d67304660002, negotiated timeout = 30000 
13/12/09 03:49:31 INFO zkclient.ZkClient: zookeeper state changed (SyncConnected) 
13/12/09 03:49:31 WARN client.IndexDeployFuture: Reconnecting IndexDeployFuture 
....................13/12/09 03:50:10 INFO zookeeper.ClientCnxn: Client session timed out, have not heard from server in 20000ms for sessionid 0x242d67304660002, closing socket connection and attempting reconnect 
13/12/09 03:50:10 INFO zkclient.ZkClient: zookeeper state changed (Disconnected) 
13/12/09 03:50:10 INFO zookeeper.ClientCnxn: Opening socket connection to server master.domain.com/<master-ip>:2181 
13/12/09 03:50:10 INFO zookeeper.ClientCnxn: Socket connection established to master.domain.com/<master-ip>:2181, initiating session 
13/12/09 03:50:10 INFO zookeeper.ClientCnxn: Session establishment complete on server master.domain.com/<master-ip>:2181, sessionid = 0x242d67304660002, negotiated timeout = 30000 
13/12/09 03:50:10 INFO zkclient.ZkClient: zookeeper state changed (SyncConnected) 
13/12/09 03:50:10 WARN client.IndexDeployFuture: Reconnecting IndexDeployFuture 
13/12/09 03:50:30 INFO zookeeper.ClientCnxn: Client session timed out, have not heard from server in 20000ms for sessionid 0x242d67304660002, closing socket connection and attempting reconnect 
13/12/09 03:50:30 INFO zkclient.ZkClient: zookeeper state changed (Disconnected) 
13/12/09 03:50:31 INFO zookeeper.ClientCnxn: Opening socket connection to server node1.domain.com/<node1-ip>:2181 
13/12/09 03:50:31 INFO zookeeper.ClientCnxn: Socket connection established to node1.domain.com/<node1-ip>:2181, initiating session 
13/12/09 03:50:31 INFO zookeeper.ClientCnxn: Session establishment complete on server node1.domain.com/<node1-ip>:2181, sessionid = 0x242d67304660002, negotiated timeout = 30000 
13/12/09 03:50:31 INFO zkclient.ZkClient: zookeeper state changed (SyncConnected) 
13/12/09 03:50:51 INFO zookeeper.ClientCnxn: Client session timed out, have not heard from server in 20000ms for sessionid 0x242d67304660002, closing socket connection and attempting reconnect 
13/12/09 03:50:51 INFO zkclient.ZkClient: zookeeper state changed (Disconnected) 
13/12/09 03:50:51 INFO zookeeper.ClientCnxn: Opening socket connection to server node2.domain.com/<node2-ip>:2181 
13/12/09 03:50:51 INFO zookeeper.ClientCnxn: Socket connection established to node2.domain.com/<node2-ip>:2181, initiating session 
13/12/09 03:50:51 INFO zookeeper.ClientCnxn: Session establishment complete on server node2.domain.com/<node2-ip>:2181, sessionid = 0x242d67304660002, negotiated timeout = 30000 
13/12/09 03:50:51 INFO zkclient.ZkClient: zookeeper state changed (SyncConnected) 
.13/12/09 03:51:09 WARN client.IndexDeployFuture: Reconnecting IndexDeployFuture 
13/12/09 03:51:09 WARN client.IndexDeployFuture: Reconnecting IndexDeployFuture 
........................................................................................................................................................................................................... 
failed to deploy index 'katta_index_1' in 374635 ms 
13/12/09 03:54:34 WARN protocol.InteractionProtocol: following components still connected:[[email protected]] 
13/12/09 03:54:34 INFO zkclient.ZkEventThread: Terminate ZkClient event thread. 
13/12/09 03:54:34 INFO zookeeper.ClientCnxn: EventThread shut down 
13/12/09 03:54:34 INFO zookeeper.ZooKeeper: Session: 0x242d67304660002 closed 

沒有在startMaster日誌多少有用的信息與此相關的問題,這是如下:

13/12/09 03:51:13 INFO master.Master: master.domain.com_cb1ea8a3-6b5e-463e-8ef5-ae9a01e1fd61 became master with 0 waiting master operations 
13/12/09 03:51:13 INFO master.Master: start managing nodes... 
13/12/09 03:51:13 INFO master.Master: found following nodes connected: [master.domain.com:20000, node2.domain.com:20000, node1.domain.com:20000] 
13/12/09 03:51:13 INFO master.OperatorThread: starting... 
13/12/09 03:51:13 INFO master.OperatorThread: SAFE MODE: No nodes available or state unstable within the last 200000 ms. 
13/12/09 03:52:03 INFO master.OperatorThread: SAFE MODE: No nodes available or state unstable within the last 200000 ms. 
13/12/09 03:52:53 INFO master.OperatorThread: SAFE MODE: No nodes available or state unstable within the last 200000 ms. 
13/12/09 03:53:43 INFO master.OperatorThread: SAFE MODE: No nodes available or state unstable within the last 200000 ms. 
13/12/09 03:54:33 INFO master.OperatorThread: SAFE MODE: leaving safe mode with 3 connected nodes 
13/12/09 03:54:33 INFO master.OperationRegistry: watch operation 'IndexDeployOperation:65d9e449:katta_index_1' for node operations [master.domain.com:20000-operation-0000000003, node1.domain.com:20000-operation-0000000003, node2.domain.com:20000-operation-0000000003] 
13/12/09 03:54:33 INFO protocol.InteractionProtocol: unregistering component [email protected]: {[email protected]=[AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live]} 
13/12/09 03:54:33 INFO master.AbstractIndexOperation: deployment of index katta_index_1 complete 
13/12/09 03:54:34 INFO master.OperationWatchdog: watch for IndexDeployOperation:65d9e449:katta_index_1 finished 
13/12/09 03:54:34 INFO master.OperatorThread: executing operation 'CheckIndicesOperation:58ca40d2' 
13/12/09 03:54:34 INFO master.OperatorThread: executing operation 'RemoveObsoleteShardsOperation:783614ab:master.domain.com:20000' 
13/12/09 03:54:34 INFO master.OperatorThread: executing operation 'RemoveObsoleteShardsOperation:48274f3c:node2.domain.com:20000' 
13/12/09 03:54:34 INFO master.OperatorThread: executing operation 'RemoveObsoleteShardsOperation:1550a7d:node1.domain.com:20000' 
13/12/09 03:54:34 INFO master.OperatorThread: executing operation 'RemoveObsoleteShardsOperation:4419391c:master.domain.com:20000' 
13/12/09 03:54:34 INFO master.OperatorThread: executing operation 'RemoveObsoleteShardsOperation:519a9fe8:master.domain.com:20000' 
13/12/09 03:54:35 INFO master.OperatorThread: executing operation 'BalanceIndexOperation:9f4152a:katta_index_1' 
13/12/09 03:54:35 INFO master.AbstractIndexOperation: balancing shards for index 'katta_index_1' 
13/12/09 03:54:35 INFO master.OperationRegistry: watch operation 'BalanceIndexOperation:9f4152a:katta_index_1' for node operations [master.domain.com:20000-operation-0000000000, node1.domain.com:20000-operation-0000000000, node2.domain.com:20000-operation-0000000000] 
13/12/09 03:54:50 INFO protocol.InteractionProtocol: unregistering component [email protected]: {[email protected]=[AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live]} 
13/12/09 03:54:50 INFO master.AbstractIndexOperation: balancing of index katta_index_1 complete 
13/12/09 03:54:50 INFO master.OperationWatchdog: watch for BalanceIndexOperation:9f4152a:katta_index_1 finished 
13/12/09 03:55:24 INFO zookeeper.ClientCnxn: Client session timed out, have not heard from server in 6666ms for sessionid 0x342d68098830007, closing socket connection and attempting reconnect 
13/12/09 03:56:13 INFO zkclient.ZkClient: zookeeper state changed (Disconnected) 
13/12/09 03:56:13 INFO zookeeper.ClientCnxn: Opening socket connection to server node1.domain.com/<node1-ip>:2181 
13/12/09 03:56:15 INFO protocol.InteractionProtocol: unregistering component [email protected]: {[email protected]=[AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live, AddRemoveListenerAdapter:/katta/nodes/live]} 
13/12/09 03:56:15 INFO master.OperatorThread: operator thread stopped 
13/12/09 03:56:15 INFO zookeeper.ClientCnxn: Socket connection established to node1.domain.com/<node1-ip>:2181, initiating session 
13/12/09 03:56:15 INFO zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x342d68098830007 has expired, closing socket connection 
13/12/09 03:56:15 INFO zkclient.ZkClient: zookeeper state changed (Expired) 
13/12/09 03:56:15 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master.domain.com,node1.domain.com,node2.domain.com sessionTimeout=10000 [email protected] 
13/12/09 03:56:15 INFO zookeeper.ClientCnxn: EventThread shut down 
13/12/09 03:56:15 INFO zookeeper.ClientCnxn: Opening socket connection to server node1.domain.com/<node1-ip>:2181 
13/12/09 03:56:15 INFO zookeeper.ClientCnxn: Socket connection established to node1.domain.com/<node1-ip>:2181, initiating session 
13/12/09 03:56:15 INFO zookeeper.ClientCnxn: Session establishment complete on server node1.domain.com/<node1-ip>:2181, sessionid = 0x242d67304660006, negotiated timeout = 10000 
13/12/09 03:56:15 INFO zkclient.ZkClient: zookeeper state changed (SyncConnected) 
13/12/09 03:56:15 INFO protocol.InteractionProtocol: master.domain.com_cb1ea8a3-6b5e-463e-8ef5-ae9a01e1fd61 started as master 
13/12/09 03:56:15 INFO protocol.InteractionProtocol: master 'master.domain.com_cb1ea8a3-6b5e-463e-8ef5-ae9a01e1fd61' published 
13/12/09 03:56:15 INFO upgrade.UpgradeRegistry: version of distribution 0.7.dev 
13/12/09 03:56:15 INFO upgrade.UpgradeRegistry: version of cluster 0.7.dev 
13/12/09 03:56:15 INFO master.Master: master.domain.com_cb1ea8a3-6b5e-463e-8ef5-ae9a01e1fd61 became master with 0 waiting master operations 
13/12/09 03:56:15 INFO master.Master: start managing nodes... 
13/12/09 03:56:15 INFO master.Master: found following nodes connected: [master.domain.com:20000, node1.domain.com:20000] 
13/12/09 03:56:15 INFO master.OperatorThread: starting... 
13/12/09 03:56:15 INFO master.OperatorThread: SAFE MODE: No nodes available or state unstable within the last 200000 ms. 
13/12/09 03:57:05 INFO master.OperatorThread: SAFE MODE: No nodes available or state unstable within the last 200000 ms. 
13/12/09 03:57:55 INFO master.OperatorThread: SAFE MODE: No nodes available or state unstable within the last 200000 ms. 
13/12/09 03:58:45 INFO master.OperatorThread: SAFE MODE: No nodes available or state unstable within the last 200000 ms. 
13/12/09 03:59:35 INFO master.OperatorThread: SAFE MODE: No nodes available or state unstable within the last 200000 ms. 
13/12/09 04:00:25 INFO master.OperatorThread: SAFE MODE: leaving safe mode with 3 connected nodes 
13/12/09 04:00:25 INFO master.OperatorThread: executing operation 'CheckIndicesOperation:6b114461' 
13/12/09 04:00:25 INFO master.OperatorThread: executing operation 'RemoveObsoleteShardsOperation:7a8b0be7:master.domain.com:20000' 

我猜測這是一些超時相關問題。有人能幫我解決這個問題嗎? 在此先感謝。

回答

0

此問題僅限於我們的Katta羣集。我發現這些節點上的磁盤I/O速度非常低。在等待時間過後沒有收到答覆時,Katta master正在等待節點回復「協商時間」中的部署狀態,Katta失敗。

由於磁盤寫入速度較低,Katta節點很忙且響應Zookeeper集羣的查詢,從而導致主節點重新啓動/進入安全模式「安全模式:在最近200000毫秒內沒有節點可用或狀態不穩定。 「

我們通過用不同的掛載替換磁盤捲來解決磁盤寫入問題,並解決了Katta部署問題。