2012-06-14 181 views
1

我有一個網站運行在使用Elastic Beanstalk部署的Amazon Web Services上,並在單個EC2微型實例上運行。這是一個臨時環境,我是唯一有權訪問它的人。使用Apache JMeter,我模擬六個用戶在網站上瀏覽,每3秒總共平均請求一次(圖像,CSS,JS和其他靜態資源由CloudFront提供服務,並且不會在EC2實例上創建流量)。亞馬遜ELB無法提供響應

問題是,經過一段時間(通常從建立環境30-60分鐘),網站停止響應。我確信Tomcat仍然正常運行,因爲我可以在日誌(catalina.out)中看到cronjob仍在執行中。似乎只有ELB無法提供迴應。

分析日誌時,Tomcat上完全沒有錯誤(none在/opt/tomcat7/logs/tail_catalina.log或/opt/tomcat7/logs/catalina.out中)。下面的錯誤儘快開始出現在的/ etc/httpd的/日誌/ elasticbeanstalk-error_log中的網站變得不可:

[Thu Jun 14 20:26:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:26:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:26:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:26:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:27:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:27:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:27:43 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:27:43 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:27:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:27:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:28:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:28:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:28:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:28:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:28:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:28:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:29:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:29:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:29:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:29:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:29:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:29:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:30:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:30:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:30:43 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:30:43 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:30:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:30:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:31:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:31:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:31:43 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:31:43 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:31:50 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:31:50 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 
[Thu Jun 14 20:32:20 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:32:20 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 

...直到EC2實例被終止最後(和一個新的自動啓動) 。

如果我沒有提出任何請求(或者如果我減少),則不會發生此問題。

任何幫助非常感謝。

謝謝!

+0

與問題無關,但由於googlability:如果您嘗試訪問只有443設置的ELB上的端口80,則可以看到「連接被拒絕」。 – Fuser97381

回答

7

讓我先假設:

  • 你的Tomcat應用程序是應該在127.0.0.1:8999

如果這是真的,日誌事件監聽:

[Thu Jun 14 20:26:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:26:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 

..表示應用程序偵聽器死亡。您可以使用此確認:

curl -v http://127.0.0.1:8999/ 

,當網站運行正常curl命令應該返回一個有效的HTTP響應,並可能會返回一個Connection refusedcouldn't connect to host當您遇到停電。您還可以使用下面的命令來檢查有效監聽器應用程序端口:

netstat -an | grep LISTEN | grep 8999 

有許多原因可以解釋爲什麼應用程序監聽器可能會死,包括但不限於:

  • JVM的硬碰撞(使用ps,看是否JVM進程仍在運行)
  • 應用的軟崩潰(看Tomcat應用程序日誌)
  • 跑了文件描述符(使用lsof | wc -l的D比較給應用程序的用戶的ulimit -n

然而,應導致的錯誤消息的大多數錯誤寫入到JVM進程的stderr,其通常記錄。這是最好看的地方。如果一切都失敗了,你可能想嘗試在啓用了調試日誌記錄的情況下在前臺運行Tomcat應用程序。

+0

非常感謝您提供完整的答案,@ gabrtv。我只是在等待一個實例再次停止服務,我會用你的建議來弄清楚問題所在。您知道通常在亞馬遜EC2上登錄的stderr在哪裏嗎?謝謝。 – satoshi

+0

'stderr'是以每個進程爲基礎登錄的。在這種情況下,您關心的是Tomcat/JVM進程的stderr。通常將其寫入日誌文件,即catalina.out或單獨的「錯誤」日誌文件。你也應該擦除'/ var/log/syslog'和'/ var/log/messages'來查看任何相關的錯誤。 – gabrtv

+0

對此有何更新?賞金很快結束;) – gabrtv

1

我剛剛花了一天的時間與這個類似的問題作鬥爭。我有一個WAR文件部署到Amazon Elastic Beanstalk環境。與我不同的是,由AEBS環境啓動的實例只持續了5分鐘,然後被AEBS替換爲新實例。

後相當多的挖掘(在5分鐘塊,而我的情況還活着)和一些light reading我發現AEBS Tomcat實例與Apache的接收端口的請求80.請求發送到/_hostmanager被重新路由到創造端口8999和其他任何端口8080(Tomcat)。部署到實例的名爲「hostmanager」的Ruby應用程序在端口8999上偵聽。此應用程序可能會報告返回到AWS Elastic Beanstalk主機管理器的其他統計信息,以允許Elastic Beanstalk環境獲取環境負載的圖片,以及適當放大或縮小實例的數量。

如果AWS Elastic Beanstalk Host Manager未從實例的主機管理器應用程序獲取響應,則它將終止該實例並啓動一個新實例。這可能是您的網站持續30分鐘然後死亡的原因。

所以我想這裏的問題不在於你的Java應用程序正在擔任了8080端口,但與hostmanager應用程序不偵聽端口8999。這可能是什麼原因造成:

[Thu Jun 14 20:26:42 2012] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8999 (localhost) failed 
[Thu Jun 14 20:26:42 2012] [error] ap_proxy_connect_backend disabling worker for (localhost) 

退房/opt/elasticbeanstalk/var/log/hostmanager.log因爲它可能會爲您提供更多線索,說明發生了什麼以及爲什麼hostmanager應用程序不愉快。

在我的情況下,事實證明我的hostmanager應用程序正在運行一個wget到Amazon S3存儲桶並獲得404響應(我從上面看到的hostmanager.log中發現了這一點)。這導致主管人員無法啓動。因此,當傳入的請求重新路由到端口8999時,沒有人在監聽。失敗。實例已終止。

與其試圖找出hostmanager應用程序失敗的原因,我決定將Elastic Beanstalk環境正在使用的AMI視爲丟失的原因。我最終放棄它,並按照以下步驟獲得流失的自定義AMI新的彈性魔豆環境:

  1. 從那是實例創建一個AMI我的WAR文件
  2. 創建一個新的彈性魔豆環境由它創建
  3. 創建從AMI在步驟中創建2
  4. 補充說我需要一些額外的比特(Tomcat管理例如)常規EC2實例
  5. 從在步驟3中
  6. 創建的普通實例創建一個AMI
  7. AMI應用於Elastic Beanstalk環境

不知道你的設置是什麼,它有點難以準確幫助。儘管希望知道主機管理員在端口8999上進行偵聽的組合,hostmanager.log的位置以及一些運氣會讓你知道你想要的位置!