2017-07-28 96 views
0

我們有一個獨立的廚師服務器安裝(v 12.1.2)。它已經運行好幾個月了,但最近它已經開始每天幾次崩潰。查看日誌,看起來好像「opscode-erchef」服務每天崩潰幾次。這是從Opscode公司-erchef崩潰日誌:廚師服務器例行崩潰

2017-07-28 08:44:26 =ERROR REPORT==== 
["Could not connect, scheduling reconnect.",{error,{{error,{badmatch,{error,{auth_failure_likely,{econnrefused,{gen_server,call,[<0.2016.0>,connect,infinity]}}}}},[{bunny_util,connect,1,[{file,"src/bunny_util.erl"},{line,191}]},{gen_bunny_mon,do_connect,3,[{file,"src/gen_bunny_mon.erl"},{line,192}]},{gen_bunny_mon,handle_info,2,[{file,"src/gen_bunny_mon.erl"},{line,134}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,593}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,659}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]},{connection_info,{network,{127,0,0,1},5672,{<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>},<<"/analytics">>}}}}] 
2017-07-28 08:44:26 =ERROR REPORT==== 
Could not start the network driver: econnrefused 
2017-07-28 08:44:26 =ERROR REPORT==== 
** Generic server <0.2019.0> terminating 
** Last message in was connect 
** When Server state == {state,<0.2017.0>,{amqp_params,<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>,<<"/chef">>,{127,0,0,1},5672,0,0,0,none,[]},undefined,undefined,undefined,undefined,undefined,undefined,<0.2018.0>,false,undefined,{{0,nil},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}},undefined,#Fun<amqp_connection_sup.0.94524864>} 
** Reason for termination == 
** {econnrefused,[{amqp_network_connection,do_connect,1,[{file,"src/amqp_network_connection.erl"},{line,337}]},{amqp_network_connection,handle_call,3,[{file,"src/amqp_network_connection.erl"},{line,93}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,607}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,639}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]} 
2017-07-28 08:44:26 =CRASH REPORT==== 
    crasher: 
    initial call: amqp_network_connection:init/1 
    pid: <0.2019.0> 
    registered_name: [] 
    exception exit: {econnrefused,[{gen_server,terminate,7,[{file,"gen_server.erl"},{line,804}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]} 
    ancestors: [<0.2017.0>,gen_bunny_mon,gen_bunny_sup,<0.1531.0>] 
    messages: [] 
    links: [<0.2017.0>] 
    dictionary: [] 
    trap_exit: false 
    status: running 
    heap_size: 610 
    stack_size: 27 
    reductions: 531 
    neighbours: 
2017-07-28 08:44:26 =SUPERVISOR REPORT==== 
    Supervisor: {<0.2017.0>,amqp_connection_sup} 
    Context: child_terminated 
    Reason:  econnrefused 
    Offender: [{pid,<0.2019.0>},{name,connection},{mfa,{amqp_network_connection,start_link,[{amqp_params,<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>,<<"/chef">>,{127,0,0,1},5672,0,0,0,none,[]},<0.2018.0>,#Fun<amqp_connection_sup.0.94524864>]}},{restart_type,intrinsic},{shutdown,brutal_kill},{child_type,worker}] 

2017-07-28 08:44:26 =SUPERVISOR REPORT==== 
    Supervisor: {<0.2017.0>,amqp_connection_sup} 
    Context: shutdown 
    Reason:  reached_max_restart_intensity 
    Offender: [{pid,<0.2019.0>},{name,connection},{mfa,{amqp_network_connection,start_link,[{amqp_params,<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>,<<"/chef">>,{127,0,0,1},5672,0,0,0,none,[]},<0.2018.0>,#Fun<amqp_connection_sup.0.94524864>]}},{restart_type,intrinsic},{shutdown,brutal_kill},{child_type,worker}] 

Rettarting opscode-erchef followed by opscode-expander service brings it back again. 

誰能告訴,在會崩潰什麼情況下Opscode公司-erchef服務?當CPU發生故障時,我看不到任何緊張情緒。所以服務器資源似乎不成問題。

謝謝!

回答

0

這個錯誤與RabbitMQ和工人有關;嘗試增加可用於RabbitMQ的連接數量或調整超時或增加連接。

rabbitmq['rabbit_mgmt_http_max_count']

對於被使用的RabbitMQ的管理插件的HTTP連接池的最大工作數。默認值:100。

rabbitmq['rabbit_mgmt_timeout']

對於被使用的的RabbitMQ管理插件的HTTP連接池的超時。默認值:30000.

要了解如何更改可調整的設置和其他設置,請參閱here

+0

感謝您的回答。我會嘗試碰撞rabbitmq參數。會讓你知道它是否有幫助。 –

+0

我將rabbitmq ['rabbit_mgmt_http_max_count']參數增加到200,並做了chef-server-ctl重新配置,但沒有幫助。現在,我的廚師服務器沒有啓動,我在opscode-erchef當前日誌中看到以下錯誤:2017-07-31_19:11:14.73681 =錯誤報告==== 2017年7月31日:: 14:11:14 = == 2017-07-31_19:11:14.73682 pool'sqerl':超時超時等待20個成員 2017-07-31_19:11:14.78063 =錯誤報告==== 2017年7月31日:: 14:11 :14 === 2017-07-31_19:11:14.78064 **通用服務器<0.1020.0>終止 –

+0

將sqerl pooler_timeout參數設置爲0.將其增加到1000,並解決超時錯誤。服務器已備份並正在運行。 –

相關問題