2017-05-04 28 views
0

12小時前我收到了我的LibreNMS監控工具發來的通知,我的12個MongoDB(3.2.11版本)服務器之一的mongo守護程序出現問題10秒連接)。我決定忽略它並等待它,我只是認爲它有點忙。MongoDB - Sharding migrateThread隨機運行超過12小時

幾個小時後,當我跑db.currentOp()時,我有點擔心。我看到有一個正在運行的操作migrateThread帶有消息「5的第2步」以及帶有消息「查詢不記錄(太大)」的幾個插入。

在做了一些互聯網搜索後,我看到它可能需要一些,因爲它正在將數據塊遷移到其他服務器。所以我決定等待它,因爲我不想中斷它,並最終在生產實例中損壞了2 TB的數據。

現在12小時過去了,我開始擔心發生了什麼。它仍然處於「第2步」,處理器負載非常高,但它似乎仍然在移動塊,併產生新的migrateThread操作以及大量的「查詢不記錄(太大)」插入。

這裏是我的currentOp()日誌的一部分:

 { 
     "desc" : "migrateThread", 
     "threadId" : "139962853246720", 
     "active" : true, 
     "opid" : -2003494368, 
     "secs_running" : 408, 
     "microsecs_running" : NumberLong(408914923), 
     "op" : "none", 
     "ns" : "data.logs", 
     "query" : { 

     }, 
     "msg" : "step 2 of 5", 
     "numYields" : 0, 
     "locks" : { 
      "Global" : "w", 
      "Database" : "w", 
      "Collection" : "w" 
     }, 
     "waitingForLock" : false, 
     "lockStats" : { 
      "Global" : { 
       "acquireCount" : { 
        "r" : NumberLong(37984), 
        "w" : NumberLong(37982) 
       } 
      }, 
      "Database" : { 
       "acquireCount" : { 
        "r" : NumberLong(1), 
        "w" : NumberLong(37981), 
        "W" : NumberLong(1) 
       }, 
       "acquireWaitCount" : { 
        "W" : NumberLong(1) 
       }, 
       "timeAcquiringMicros" : { 
        "W" : NumberLong(1446) 
       } 
      }, 
      "Collection" : { 
       "acquireCount" : { 
        "r" : NumberLong(1), 
        "w" : NumberLong(37980), 
        "W" : NumberLong(1) 
       }, 
       "acquireWaitCount" : { 
        "W" : NumberLong(1) 
       }, 
       "timeAcquiringMicros" : { 
        "W" : NumberLong(3224) 
       } 
      } 
     } 
    }, 
    { 
     "desc" : "conn451221", 
     "threadId" : "139962959451904", 
     "connectionId" : 451221, 
     "client" : "10.0.0.111:57408", 
     "active" : true, 
     "opid" : -2003439364, 
     "secs_running" : 0, 
     "microsecs_running" : NumberLong(37333), 
     "op" : "insert", 
     "ns" : "data.logs", 
     "query" : { 
      "$msg" : "query not recording (too large)" 
     }, 
     "numYields" : 0, 
     "locks" : { 
      "Global" : "w", 
      "Database" : "w", 
      "Collection" : "w" 
     }, 
     "waitingForLock" : false, 
     "lockStats" : { 
      "Global" : { 
       "acquireCount" : { 
        "r" : NumberLong(1), 
        "w" : NumberLong(1) 
       } 
      }, 
      "Database" : { 
       "acquireCount" : { 
        "w" : NumberLong(1) 
       } 
      }, 
      "Collection" : { 
       "acquireCount" : { 
        "w" : NumberLong(1) 
       } 
      } 
     } 
    }, 

當我檢查mongod.log我看到以下內容:

2017-05-04T19:08:14.203Z I SHARDING [migrateThread] starting receiving-end of migration of chunk { _id: -8858253000066304220 } -> { _id: -8857450400323294366 } for collection data.logs from mongo03:27017 at epoch 56f5410efed7ec477fb62e31 
2017-05-04T19:08:14.350Z I SHARDING [migrateThread] Deleter starting delete for: data.logs from { _id: -8858253000066304220 } -> { _id: -8857450400323294366 }, with opId: 2291391315 
2017-05-04T19:08:14.350Z I SHARDING [migrateThread] rangeDeleter deleted 0 documents for data.logs from { _id: -8858253000066304220 } -> { _id: -8857450400323294366 } 
2017-05-04T19:18:26.625Z I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 
2017-05-04T19:18:26.625Z I SHARDING [migrateThread] migrate commit succeeded flushing to secondaries for 'data.logs' { _id: -8858253000066304220 } -> { _id: -8857450400323294366 } 
2017-05-04T19:18:36.499Z I SHARDING [migrateThread] migrate commit succeeded flushing to secondaries for 'data.logs' { _id: -8858253000066304220 } -> { _id: -8857450400323294366 } 
2017-05-04T19:18:36.788Z I SHARDING [migrateThread] about to log metadata event into changelog: { _id: "mongo01-2017-05-04T21:18:36.788+0200-590b7e8c1bc38fe0dd61db45", server: "mongo01", clientAddr: "", time: new Date(1493925516788), what: "moveChunk.to", ns: "data.logs", details: { min: { _id: -8858253000066304220 }, max: { _id: -8857450400323294366 }, step 1 of 5: 146, step 2 of 5: 279, step 3 of 5: 611994, step 4 of 5: 0, step 5 of 5: 10162, note: "success" } } 
2017-05-04T19:19:04.059Z I SHARDING [migrateThread] starting receiving-end of migration of chunk { _id: -9090190725188397877 } -> { _id: -9088854275798899737 } for collection data.logs from mongo04:27017 at epoch 56f5410efed7ec477fb62e31 
2017-05-04T19:19:04.063Z I SHARDING [migrateThread] Deleter starting delete for: data.logs from { _id: -9090190725188397877 } -> { _id: -9088854275798899737 }, with opId: 2291472928 
2017-05-04T19:19:04.064Z I SHARDING [migrateThread] rangeDeleter deleted 0 documents for data.logs from { _id: -9090190725188397877 } -> { _id: -9088854275798899737 } 
2017-05-04T19:28:16.709Z I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 
2017-05-04T19:28:16.709Z I SHARDING [migrateThread] migrate commit succeeded flushing to secondaries for 'data.logs' { _id: -9090190725188397877 } -> { _id: -9088854275798899737 } 
2017-05-04T19:28:17.778Z I SHARDING [migrateThread] migrate commit succeeded flushing to secondaries for 'data.logs' { _id: -9090190725188397877 } -> { _id: -9088854275798899737 } 
2017-05-04T19:28:17.778Z I SHARDING [migrateThread] about to log metadata event into changelog: { _id: "mongo01-2017-05-04T21:28:17.778+0200-590b80d11bc38fe0dd61db46", server: "mongo01", clientAddr: "", time: new Date(1493926097778), what: "moveChunk.to", ns: "data.logs", details: { min: { _id: -9090190725188397877 }, max: { _id: -9088854275798899737 }, step 1 of 5: 3, step 2 of 5: 4, step 3 of 5: 552641, step 4 of 5: 0, step 5 of 5: 1068, note: "success" } } 
2017-05-04T19:28:34.889Z I SHARDING [migrateThread] starting receiving-end of migration of chunk { _id: -8696921045434215002 } -> { _id: -8696381531400161154 } for collection data.logs from mongo06:27017 at epoch 56f5410efed7ec477fb62e31 
2017-05-04T19:28:35.134Z I SHARDING [migrateThread] Deleter starting delete for: data.logs from { _id: -8696921045434215002 } -> { _id: -8696381531400161154 }, with opId: 2291544986 
2017-05-04T19:28:35.134Z I SHARDING [migrateThread] rangeDeleter deleted 0 documents for data.logs from { _id: -8696921045434215002 } -> { _id: -8696381531400161154 } 

所以它走的是一條很長的時間來遷移數據。這是我應該擔心的事嗎?我應該採取任何行動還是讓它等待?

要說清楚的是,我自己並沒有開始任何遷移。它本身發生了。所以這就是爲什麼我有點困惑。

請幫忙!

回答

0

它解決了自己,只好等了很長時間。之後,其他服務器開始使用「RangeDeleter」操作,現在看來它們都很好。