我想解決在EC2上的Ubuntu 11.04上運行在龍捲風2.4上的應用程序。它似乎定期觸及100%的CPU,並在該請求停止幾秒鐘。龍捲風應用程序定期停止幾秒鐘,100%CPU
對此非常感謝。
症狀:
- 頂部顯示100%的CPU就在它停止的時間。通常服務器的CPU利用率約爲30-60%。
- 它每2-5分鐘就會停止一次請求。我已經檢查過沒有影響這個的cronjob。
- 停止約2至9秒。重新啓動龍捲風時問題消失,龍捲風正常運行時間會惡化。服務器運行時間越長,停機時間越長。
- 問題出現的Http請求似乎沒有任何模式。
- 有趣的是,日誌中的下一個請求有時有時會匹配暫停的持續時間,有時候不會。例如:
00:00:00 GET /some/request() 00:00:09 GET /next/request (9000ms) 00:00:00 GET /some/request() 00:00:09 GET /next/request (1ms) # 9 seconds gap in requests is certainly not possible as clients are constantly polling.
- 數據庫(mongodb的)示出了沒有昂貴的或大量的查詢。沒有頁面錯誤。數據庫在同一臺機器上 - 本地磁盤上。
- 與最近幾分鐘相比,vmstat顯示讀取/寫入大小沒有變化。
龍捲風運行在nginx之後。
當它最有可能停止時發送SIGINT,每次都會給出不同的堆棧跟蹤。其中有些是下面:
Traceback (most recent call last): File "chat/main.py", line 3396, in <module> main() File "chat/main.py", line 3392, in main tornado.ioloop.IOLoop.instance().start() File "/home/ubuntu/tornado/tornado/ioloop.py", line 515, in start self._run_callback(callback) File "/home/ubuntu/tornado/tornado/ioloop.py", line 370, in _run_callback callback() File "/home/ubuntu/tornado/tornado/stack_context.py", line 216, in wrapped callback(*args, **kwargs) File "/home/ubuntu/tornado/tornado/iostream.py", line 303, in wrapper callback(*args) File "/home/ubuntu/tornado/tornado/stack_context.py", line 216, in wrapped callback(*args, **kwargs) File "/home/ubuntu/tornado/tornado/httpserver.py", line 298, in _on_request_body self.request_callback(self._request) File "/home/ubuntu/tornado/tornado/web.py", line 1421, in __call__ handler = spec.handler_class(self, request, **spec.kwargs) File "/home/ubuntu/tornado/tornado/web.py", line 126, in __init__ application.ui_modules.iteritems()) File "/home/ubuntu/tornado/tornado/web.py", line 125, in <genexpr> self.ui["_modules"] = ObjectDict((n, self._ui_module(n, m)) for n, m in File "/home/ubuntu/tornado/tornado/web.py", line 1114, in _ui_module def _ui_module(self, name, module): KeyboardInterrupt Traceback (most recent call last): File "chat/main.py", line 3398, in <module> main() File "chat/main.py", line 3394, in main tornado.ioloop.IOLoop.instance().start() File "/home/ubuntu/tornado/tornado/ioloop.py", line 515, in start self._run_callback(callback) File "/home/ubuntu/tornado/tornado/ioloop.py", line 370, in _run_callback callback() File "/home/ubuntu/tornado/tornado/stack_context.py", line 216, in wrapped callback(*args, **kwargs) File "/home/ubuntu/tornado/tornado/iostream.py", line 303, in wrapper callback(*args) File "/home/ubuntu/tornado/tornado/stack_context.py", line 216, in wrapped callback(*args, **kwargs) File "/home/ubuntu/tornado/tornado/httpserver.py", line 285, in _on_headers self.request_callback(self._request) File "/home/ubuntu/tornado/tornado/web.py", line 1408, in __call__ transforms = [t(request) for t in self.transforms] File "/home/ubuntu/tornado/tornado/web.py", line 1811, in __init__ def __init__(self, request): KeyboardInterrupt Traceback (most recent call last): File "chat/main.py", line 3351, in <module> main() File "chat/main.py", line 3347, in main tornado.ioloop.IOLoop.instance().start() File "/home/ubuntu/tornado/tornado/ioloop.py", line 571, in start self._handlers[fd](fd, events) File "/home/ubuntu/tornado/tornado/stack_context.py", line 216, in wrapped callback(*args, **kwargs) File "/home/ubuntu/tornado/tornado/netutil.py", line 342, in accept_handler callback(connection, address) File "/home/ubuntu/tornado/tornado/netutil.py", line 237, in _handle_connection self.handle_stream(stream, address) File "/home/ubuntu/tornado/tornado/httpserver.py", line 156, in handle_stream self.no_keep_alive, self.xheaders, self.protocol) File "/home/ubuntu/tornado/tornado/httpserver.py", line 183, in __init__ self.stream.read_until(b("\r\n\r\n"), self._header_callback) File "/home/ubuntu/tornado/tornado/iostream.py", line 139, in read_until self._try_inline_read() File "/home/ubuntu/tornado/tornado/iostream.py", line 385, in _try_inline_read if self._read_to_buffer() == 0: File "/home/ubuntu/tornado/tornado/iostream.py", line 401, in _read_to_buffer chunk = self.read_from_fd() File "/home/ubuntu/tornado/tornado/iostream.py", line 632, in read_from_fd chunk = self.socket.recv(self.read_chunk_size) KeyboardInterrupt
如何解決這個任何提示是極大的讚賞。
進一步的觀測:
strace的-p,期間它掛起時間,示出了空的輸出。期間掛起時間
ltrace -p顯示僅免費()調用大量: 免費(0x6fa70080)= 免費(0x1175f8060)= 免費(0x117a5c370)=
是否有您註冊一個PeriodicCallback的機會嗎?如果某些操作被阻止,可能會阻止ioloop。 – oDDsKooL 2013-02-28 08:53:08
你使用同步連接到MongoDB嗎?在一個TCP套接字?如果由於任何原因數據庫或網絡掛起它將阻止ioloop。 – oDDsKooL 2013-02-28 08:54:56
這個掛鉤是否真的每5mn出現一次?在這種情況下,我會傾向於db連接塊(檢查MongoDB日誌的服務時間大於1s)。否則,我對EC2和類似堆棧(龍捲風+ redis)的體驗是每天有一兩次類似的掛起出現幾秒鐘,似乎與SYN洪水攻擊有關。你的盒子端口80是否打開?請檢查'dmesg'是否有關於此類攻擊的提示。 – oDDsKooL 2013-02-28 08:58:29