Monit - 如何識別程序的崩潰而不是重新啓動

我正在使用monit來監視我的程序。正在監視的程序可能會在2種情況下崩潰Monit - 如何識別程序的崩潰而不是重新啓動

程序可能會隨機崩潰。它只是需要重新啓動
它進入不良狀態和崩潰每次隨後開始

要修正後一種情況的時候，我有一個腳本來停止程序，其重置爲良好狀態通過清理其數據文件並重新啓動它。我嘗試了下面的配置

check process program with pidfile program.pid 
start program = "programStart" as uid username and gid groupname 
stop program = "programStop" as uid username and gid groupname 
if 3 restarts within 20 cycles then exec "cleanProgramAndRestart" as uid username and gid groupname 
if 6 restarts within 20 cycles then timeout

假設monit在3個週期內重新啓動程序3次。在第三次重新啓動後，運行腳本cleanProgramAndRestart。但是，隨着cleanProgramAndRestart腳本再次重新啓動程序，3次重新啓動的條件在下一個循環中再次滿足，並且變爲無限循環

任何人都可以提出任何解決此問題的方法嗎？

如果以下任何一項操作都是可能的，那麼可能有辦法解決。

如果有一個「撞車」的關鍵字，而不是「重新啓動」，我就能在節目後運行潔淨腳本崩潰 3倍，而不是以後就重啓 3次
如果有一種方法運行EXEC腳本
後重置以某種方式「重新啓動」計數器如果給exec東西的方式只有當條件3重啓的輸出改變

來源

2013-08-27 Sparrow

Monit在每個週期輪詢您的「測試」。週期長度通常在/etc/monitrc中定義，set daemon cycle_length

因此，如果您的cleanProgramAndRestart需要少於一個週期來執行，則不應該發生。當它發生時，我想你的cleanProgramAndRestart需要超過一個週期才能完成。

您可以：

增加monit的配置週期長度
檢查程序每隔x週期（確保cycle_length * X> cleanProgramAndRestart_length）

如果你不能修改這些變量，可能會有一些解決方法，使用臨時文件：

check process program 
    with pidfile program.pid 
    start program = "programStart" 
    as uid username and gid groupname 
    stop program = "programStop" 
    as uid username and gid groupname 
    if 3 restarts within 20 cycles 
    then exec "touch /tmp/program__is_crashed" 
    if 6 restarts within 20 cycles then timeout 

check file program_crash with path /tmp/program_crash every x cycles #(make sure that cycle_length*x > cleanProgramAndRestart_length) 
    if changed timestamp then exec "cleanProgramAndRestart" 
    as uid username and gid groupname

來源

2013-09-26 17:56:55

Monit - 如何識別程序的崩潰而不是重新啓動

回答

相關問題