0
我們有一個獨立的廚師服務器安裝(v 12.1.2)。它已經運行好幾個月了,但最近它已經開始每天幾次崩潰。查看日誌,看起來好像「opscode-erchef」服務每天崩潰幾次。這是從Opscode公司-erchef崩潰日誌:廚師服務器例行崩潰
2017-07-28 08:44:26 =ERROR REPORT====
["Could not connect, scheduling reconnect.",{error,{{error,{badmatch,{error,{auth_failure_likely,{econnrefused,{gen_server,call,[<0.2016.0>,connect,infinity]}}}}},[{bunny_util,connect,1,[{file,"src/bunny_util.erl"},{line,191}]},{gen_bunny_mon,do_connect,3,[{file,"src/gen_bunny_mon.erl"},{line,192}]},{gen_bunny_mon,handle_info,2,[{file,"src/gen_bunny_mon.erl"},{line,134}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,593}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,659}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]},{connection_info,{network,{127,0,0,1},5672,{<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>},<<"/analytics">>}}}}]
2017-07-28 08:44:26 =ERROR REPORT====
Could not start the network driver: econnrefused
2017-07-28 08:44:26 =ERROR REPORT====
** Generic server <0.2019.0> terminating
** Last message in was connect
** When Server state == {state,<0.2017.0>,{amqp_params,<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>,<<"/chef">>,{127,0,0,1},5672,0,0,0,none,[]},undefined,undefined,undefined,undefined,undefined,undefined,<0.2018.0>,false,undefined,{{0,nil},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}},undefined,#Fun<amqp_connection_sup.0.94524864>}
** Reason for termination ==
** {econnrefused,[{amqp_network_connection,do_connect,1,[{file,"src/amqp_network_connection.erl"},{line,337}]},{amqp_network_connection,handle_call,3,[{file,"src/amqp_network_connection.erl"},{line,93}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,607}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,639}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
2017-07-28 08:44:26 =CRASH REPORT====
crasher:
initial call: amqp_network_connection:init/1
pid: <0.2019.0>
registered_name: []
exception exit: {econnrefused,[{gen_server,terminate,7,[{file,"gen_server.erl"},{line,804}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
ancestors: [<0.2017.0>,gen_bunny_mon,gen_bunny_sup,<0.1531.0>]
messages: []
links: [<0.2017.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 610
stack_size: 27
reductions: 531
neighbours:
2017-07-28 08:44:26 =SUPERVISOR REPORT====
Supervisor: {<0.2017.0>,amqp_connection_sup}
Context: child_terminated
Reason: econnrefused
Offender: [{pid,<0.2019.0>},{name,connection},{mfa,{amqp_network_connection,start_link,[{amqp_params,<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>,<<"/chef">>,{127,0,0,1},5672,0,0,0,none,[]},<0.2018.0>,#Fun<amqp_connection_sup.0.94524864>]}},{restart_type,intrinsic},{shutdown,brutal_kill},{child_type,worker}]
2017-07-28 08:44:26 =SUPERVISOR REPORT====
Supervisor: {<0.2017.0>,amqp_connection_sup}
Context: shutdown
Reason: reached_max_restart_intensity
Offender: [{pid,<0.2019.0>},{name,connection},{mfa,{amqp_network_connection,start_link,[{amqp_params,<<"chef">>,<<"de073b36fa2831124fc06a121702610c1ddaf367cc9aad74e9a9ba7381fa355a9fd8aaf4be57745fecfc0f0a1c275aab3190">>,<<"/chef">>,{127,0,0,1},5672,0,0,0,none,[]},<0.2018.0>,#Fun<amqp_connection_sup.0.94524864>]}},{restart_type,intrinsic},{shutdown,brutal_kill},{child_type,worker}]
Rettarting opscode-erchef followed by opscode-expander service brings it back again.
誰能告訴,在會崩潰什麼情況下Opscode公司-erchef服務?當CPU發生故障時,我看不到任何緊張情緒。所以服務器資源似乎不成問題。
謝謝!
感謝您的回答。我會嘗試碰撞rabbitmq參數。會讓你知道它是否有幫助。 –
我將rabbitmq ['rabbit_mgmt_http_max_count']參數增加到200,並做了chef-server-ctl重新配置,但沒有幫助。現在,我的廚師服務器沒有啓動,我在opscode-erchef當前日誌中看到以下錯誤:2017-07-31_19:11:14.73681 =錯誤報告==== 2017年7月31日:: 14:11:14 = == 2017-07-31_19:11:14.73682 pool'sqerl':超時超時等待20個成員 2017-07-31_19:11:14.78063 =錯誤報告==== 2017年7月31日:: 14:11 :14 === 2017-07-31_19:11:14.78064 **通用服務器<0.1020.0>終止 –
將sqerl pooler_timeout參數設置爲0.將其增加到1000,並解決超時錯誤。服務器已備份並正在運行。 –