的mesos-slave
它無法更新屬性(isolation
)後重新註冊:不能重新註冊mesos劑
6868 status_update_manager.cpp:177] Pausing sending status updates
6877 slave.cpp:915] New master detected at [email protected]:5050
6867 status_update_manager.cpp:177] Pausing sending status updates
6877 slave.cpp:936] No credentials provided. Attempting to register without authentication
6877 slave.cpp:947] Detecting new master
6869 slave.cpp:1217] Re-registered with master [email protected]:5050
6866 status_update_manager.cpp:184] Resuming sending status updates
6869 slave.cpp:1253] Forwarding total oversubscribed resources {}
6874 slave.cpp:4141] Master marked the agent as disconnected but the agent considers itself registered! Forcing re-registration.
6874 slave.cpp:904] Re-detecting master
6874 slave.cpp:947] Detecting new master
6874 status_update_manager.cpp:177] Pausing sending status updates
6869 status_update_manager.cpp:177] Pausing sending status updates
6871 slave.cpp:915] New master detected at [email protected]:5050
6871 slave.cpp:936] No credentials provided. Attempting to register without authentication
6871 slave.cpp:947] Detecting new master
6872 slave.cpp:1217] Re-registered with master [email protected]:5050
6872 slave.cpp:1253] Forwarding total oversubscribed resources {}
6871 status_update_manager.cpp:184] Resuming sending status updates
6871 slave.cpp:4141] Master marked the agent as disconnected but the agent considers itself registered! Forcing re-registration.
這似乎是停留在一個無限循環。任何想法如何開始新鮮的奴隸?我試圖刪除work_dir
並重新啓動mesos-slave
過程,但沒有任何成功。
該情況是由意外重命名爲work_dir
引起的。重新啓動mesos-slave
後,它無法重新連接,也無法停止正在運行的任務。我試圖從機上使用cleanup
:
echo 'cleanup' > /etc/mesos-slave/recover
service mesos-slave restart
# after recovery finishes
rm /etc/mesos-slave/recover
service mesos-slave restart
這部分幫助,但還是有很多殭屍任務馬拉松,因爲Mesos主無法檢索有關任務的任何信息。當我查看指標時,我發現有些奴隸被標記爲「無效」。
UPDATE:在主日誌中出現以下:
Cannot kill task service_mesos-kafka_kafka.e0e3e128-ef0e-11e6-af93-fead7f32c37c
of framework ecd3a4be-d34c-46f3-b358-c4e26ac0d131-0000 (marathon) at
[email protected]:52192
because the agent cac09818-0d75-46a9-acb1-4e17fdb9e328-S10 at
slave(1)@192.168.1.1:5051 (w10.example.net) is disconnected.
Kill will be retried if the agent re-registers
重新啓動當前mesos-master
後:
Cannot kill task service_mesos-kafka_kafka.e0e3e128-ef0e-11e6-af93-fead7f32c37c
of framework ecd3a4be-d34c-46f3-b358-c4e26ac0d131-0000 (marathon)
at [email protected]:39972
because it is unknown; performing reconciliation
Performing explicit task state reconciliation for 1 tasks
of framework ecd3a4be-d34c-46f3-b358-c4e26ac0d131-0000 (marathon)
at [email protected]:39972
Dropping reconciliation of task service_mesos-kafka_kafka.e0e3e128-ef0e-11e6-af93-fead7f32c37c
for framework ecd3a4be-d34c-46f3-b358-c4e26ac0d131-0000 (marathon)
at [email protected]:39972
because there are transitional agents
你可以附上主日誌嗎? – janisz
我在主日誌中找不到任何相關內容。它看起來像mesos標記爲舊奴隸非活動,它仍在等待他們的恢復。 – Tombart