芹菜：WorkerLostError：工人過早退出：信號9（SIGKILL）

我在Django應用程序（Elastic Beanstalk上）中使用帶有RabbitMQ的Celery來管理後臺任務，並使用Supervisor對其進行守護。現在的問題是，我定義的失敗（一個星期後，在一切正常）期間的任務之一，我得到的錯誤是：芹菜：WorkerLostError：工人過早退出：信號9（SIGKILL）

[01/Apr/2014 23:04:03] [ERROR] [celery.worker.job:272] Task clean-dead-sessions[1bfb5a0a-7914-4623-8b5b-35fc68443d2e] raised unexpected: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',) 
Traceback (most recent call last): 
    File "/opt/python/run/venv/lib/python2.7/site-packages/billiard/pool.py", line 1168, in mark_as_worker_lost 
    human_status(exitcode)), 
WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).

全部由導師管理的過程正常運行（supervisorctl status說RUNNNING）。

我試着在我的ec2實例上讀取幾個日誌，但似乎沒有人幫我找出SIGKILL的原因。我該怎麼辦？我該如何調查？

這些都是我的芹菜設置：

CELERY_TIMEZONE = 'UTC' 
CELERY_TASK_SERIALIZER = 'json' 
CELERY_ACCEPT_CONTENT = ['json'] 
BROKER_URL = os.environ['RABBITMQ_URL'] 
CELERY_IGNORE_RESULT = True 
CELERY_DISABLE_RATE_LIMITS = False 
CELERYD_HIJACK_ROOT_LOGGER = False

的，這是我supervisord.conf：

[program:celery_worker] 
environment=$env_variables 
directory=/opt/python/current/app 
command=/opt/python/run/venv/bin/celery worker -A com.cygora -l info --pidfile=/opt/python/run/celery_worker.pid 
startsecs=10 
stopwaitsecs=60 
stopasgroup=true 
killasgroup=true 
autostart=true 
autorestart=true 
stdout_logfile=/opt/python/log/celery_worker.stdout.log 
stdout_logfile_maxbytes=5MB 
stdout_logfile_backups=10 
stderr_logfile=/opt/python/log/celery_worker.stderr.log 
stderr_logfile_maxbytes=5MB 
stderr_logfile_backups=10 
numprocs=1 

[program:celery_beat] 
environment=$env_variables 
directory=/opt/python/current/app 
command=/opt/python/run/venv/bin/celery beat -A com.cygora -l info --pidfile=/opt/python/run/celery_beat.pid --schedule=/opt/python/run/celery_beat_schedule 
startsecs=10 
stopwaitsecs=300 
stopasgroup=true 
killasgroup=true 
autostart=false 
autorestart=true 
stdout_logfile=/opt/python/log/celery_beat.stdout.log 
stdout_logfile_maxbytes=5MB 
stdout_logfile_backups=10 
stderr_logfile=/opt/python/log/celery_beat.stderr.log 
stderr_logfile_maxbytes=5MB 
stderr_logfile_backups=10 
numprocs=1

編輯：後重啓芹菜打敗問題依然存在:(

編輯2：將killasgroup = true更改爲killasgroup = false，問題仍然存在

來源

2014-04-02 daveoncode

您的工作人員收到的SIGKILL由另一個進程啓動。你的supervisord配置看起來很好，而且killasgroup只會影響主管發起的kill（例如ctl或者一個插件） - 如果沒有這個設置，它會發送信號給調度器，而不是孩子。

很可能你有內存泄漏，操作系統的oomkiller正在刺激你的進程不良行爲。

grep oom /var/log/messages。如果你看到消息，那是你的問題。

如果你沒有發現什麼，試着在shell手動運行週期的過程：

MyPeriodicTask().run()

看看會發生什麼。如果你沒有這臺主機的仙人掌，神經節等好的儀器，我會在另一個終端監視系統和進程指標。

來源

2014-04-03 16:52:49

你說得對，「celery invoked oom-killer：gfp_mask = 0x201da，order = 0，oom_adj = 0，oom_score_adj = 0」...現在我必須找到爲什麼會發生這種情況，因爲之前它按預期運行：P非常感謝你！ – daveoncode

@daveoncode我覺得劉易斯卡羅爾曾經寫道：「當心兒子殺手，我的兒子！咬我的爪子，抓住！「 –

在我的Ubuntu盒子裏，檢查的日誌是'/ var/log/kern.log'，而不是'/ var/log/messages' –

芹菜：WorkerLostError：工人過早退出：信號9（SIGKILL）

回答

相關問題