2017-01-24 59 views
1

我們有一個彈性搜索羣集,有5個數據節點和2個主節點。一個主節點上的彈性搜索服務始終處於禁用狀態,因此始終只有一個主節點處於活動狀態。今天由於某種原因,當前主節點已關閉。我們在第二個主節點上啓動了服務。所有連接到新主數據庫的數據節點都成功分配了所有主分片,但所有副本都未分配,並且留下了將近384個未分配的分片。彈性搜索主數據災難恢復

我現在應該怎麼做,分配它們?

在這種情況下必須採取的最佳做法和步驟是什麼?

以下是我http://es-master-node:9200/_settings看起來像:http://pastebin.com/mK1QBfP6

當我嘗試手動分配的碎片,我得到以下錯誤:

➜ Desktop curl -XPOST http://localhost:9200/_cluster/reroute\?pretty -d '{ 
    "commands": [ 
    { 
     "allocate": { 
     "index": "logstash-1970.01.18", 
     "shard": 1, 
     "node": "node-name", 
     "allow_primary": true 
     } 
    } 
    ] 
}' 
{ 
    "error" : { 
    "root_cause" : [ { 
     "type" : "illegal_argument_exception", 
     "reason" : "[allocate] allocation of [logstash-1970.01.18][1] on node {node-name}{vrVG4CBbSvubWHOzn2qfQA}{10.100.0.146}{10.100.0.146:9300}{master=false} is not allowed, reason: [YES(allocation disabling is ignored)][NO(more than allowed [85.0%] used disk on node, free: [13.671127301258165%])][YES(shard not primary or relocation disabled)][YES(target node version [2.2.0] is same or newer than source node version [2.2.0])][YES(no allocation awareness enabled)][YES(shard is not allocated to same node or host)][YES(allocation disabling is ignored)][YES(below shard recovery limit of [2])][YES(total shard limit disabled: [index: -1, cluster: -1] <= 0)][YES(node passes include/exclude/require filters)][YES(primary is already active)]" 
    } ], 
    "type" : "illegal_argument_exception", 
    "reason" : "[allocate] allocation of [logstash-1970.01.18][1] on node {node-name}{vrVG4CBbSvubWHOzn2qfQA}{10.100.0.146}{10.100.0.146:9300}{master=false} is not allowed, reason: [YES(allocation disabling is ignored)][NO(more than allowed [85.0%] used disk on node, free: [13.671127301258165%])][YES(shard not primary or relocation disabled)][YES(target node version [2.2.0] is same or newer than source node version [2.2.0])][YES(no allocation awareness enabled)][YES(shard is not allocated to same node or host)][YES(allocation disabling is ignored)][YES(below shard recovery limit of [2])][YES(total shard limit disabled: [index: -1, cluster: -1] <= 0)][YES(node passes include/exclude/require filters)][YES(primary is already active)]" 
    }, 
    "status" : 400 
} 

任何幫助將不勝感激。

回答

0

所以,這裏是我做了分配未分配的碎片的東西:

菌種5新ES-DATA服務器,等待他們加入羣集。一旦它們在羣集中,我使用以下腳本:

#!/bin/bash 
array=(node1 node2 node3 node4 node5) 
node_counter=0 
length=${#array[@]} 
IFS=$'\n' 
for line in $(curl -s 'http://ip-adress:9200/_cat/shards'| fgrep UNASSIGNED); do 
    INDEX=$(echo $line | (awk '{print $1}')) 
    SHARD=$(echo $line | (awk '{print $2}')) 
    NODE=${array[$node_counter]} 
    echo $NODE 
    curl -XPOST 'http://IP-adress:9200/_cluster/reroute' -d '{ 
     "commands": [ 
     { 
      "allocate": { 
       "index": "'$INDEX'", 
       "shard": '$SHARD', 
       "node": "'$NODE'", 
       "allow_primary": true 
      } 
     } 
     ] 
    }' 
    node_counter=$(((node_counter)%length +1)) 
done 

將未分配的分片分配給新的數據節點。大約需要5到6天才能恢復。雖然這是黑客行爲,但相關答案會更有意義。

以下是懸而未決的問題:

  • 的碎片已經在那裏的老節點,爲什麼不ES-法師意識到?
  • 我們如何才能明確要求ES-MASTER掃描已經存在的數據節點,並從他們那裏得到信息(有關其當前狀態,他們有副本,碎片,它們包含等)