加載nginx + Unicorn（Rails 3應用程序）時出現錯誤的網關錯誤

我有一個運行在雲平臺上的nginx和獨角獸的Rails（3.2）應用程序。「盒子」在Ubuntu 12.04上運行。加載nginx + Unicorn（Rails 3應用程序）時出現錯誤的網關錯誤

當系統負載是在約70％以上，nginx的突然（和表面上隨機地）開始投擲502網關錯誤;當負載較少時，沒有什麼比這更好的了。我已經嘗試過不同數量的內核（4，6，10 - 我可以在雲平臺上「更換硬件」），情況總是相同的。（CPU負載與系統負載類似，用戶佔用率爲55％，其餘爲系統和被盜，具有足夠的可用內存，不需要交換）。

502通常分批但並不總是。

（我每運行一個核心麒麟工人，以及一個或兩個nginx的工人。見CONFIGS的相關部分的10個內核上運行時以下。）

我真的不知道如何追蹤原因這些錯誤。我懷疑這可能與獨角獸工作人員無法服務（及時？）有關，但看起來很奇怪，因爲他們似乎沒有飽和CPU，我看不出爲什麼他們會等待IO（但我不知道不知道如何確定這一點）。

請問你能幫我找到原因嗎？

麒麟配置（unicorn.rb）：

worker_processes 10 
working_directory "/var/www/app/current" 
listen "/var/www/app/current/tmp/sockets/unicorn.sock", :backlog => 64 
listen 2007, :tcp_nopush => true 
timeout 90 
pid "/var/www/app/current/tmp/pids/unicorn.pid" 
stderr_path "/var/www/app/shared/log/unicorn.stderr.log" 
stdout_path "/var/www/app/shared/log/unicorn.stdout.log" 
preload_app true 
GC.respond_to?(:copy_on_write_friendly=) and 
    GC.copy_on_write_friendly = true 
check_client_connection false 

before_fork do |server, worker| 
    ... I believe the stuff here is irrelevant ... 
end 
after_fork do |server, worker| 
    ... I believe the stuff here is irrelevant ... 
end

而且ngnix配置：

/etc/nginx/nginx.conf：

worker_processes 2; 
worker_rlimit_nofile 2048; 
user www-data www-admin; 
pid /var/run/nginx.pid; 
error_log /var/log/nginx/nginx.error.log info; 

events { 
    worker_connections 2048; 
    accept_mutex on; # "on" if nginx worker_processes > 1 
    use epoll; 
} 

http { 
    include  /etc/nginx/mime.types; 
    default_type application/octet-stream; 
    log_format main '$remote_addr - $remote_user [$time_local] "$request" ' 
         '$status $body_bytes_sent "$http_referer" ' 
         '"$http_user_agent" "$http_x_forwarded_for"'; 
    access_log /var/log/nginx/access.log main; 
    # optimialization efforts 
    client_max_body_size  2m; 
    client_body_buffer_size  128k; 
    client_header_buffer_size 4k; 
    large_client_header_buffers 10 4k; # one for each core or one for each unicorn worker? 
    client_body_temp_path  /tmp/nginx/client_body_temp; 

    include /etc/nginx/conf.d/*.conf; 
}

/etc/nginx/conf.d/app.conf：

sendfile on; 
tcp_nopush on; 
tcp_nodelay off; 
gzip on; 
gzip_http_version 1.0; 
gzip_proxied any; 
gzip_min_length 500; 
gzip_disable "MSIE [1-6]\."; 
gzip_types text/plain text/css text/javascript application/x-javascript; 

upstream app_server { 
    # fail_timeout=0 means we always retry an upstream even if it failed 
    # to return a good HTTP response (in case the Unicorn master nukes a 
    # single worker for timing out). 
    server unix:/var/www/app/current/tmp/sockets/unicorn.sock fail_timeout=0; 
} 

server { 
    listen 80 default deferred; 
    server_name _; 
    client_max_body_size 1G; 
    keepalive_timeout 5; 
    root /var/www/app/current/public; 

    location ~ "^/assets/.*" { 
     ... 
    } 

    # Prefer to serve static files directly from nginx to avoid unnecessary 
    # data copies from the application server. 
    try_files $uri/index.html $uri.html $uri @app; 

    location @app { 
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 
    proxy_set_header Host $http_host; 
    proxy_redirect off; 

    proxy_pass http://app_server; 

    proxy_connect_timeout  90; 
    proxy_send_timeout   90; 
    proxy_read_timeout   90; 

    proxy_buffer_size   128k; 
    proxy_buffers    10 256k; # one per core or one per unicorn worker? 
    proxy_busy_buffers_size 256k; 
    proxy_temp_file_write_size 256k; 
    proxy_max_temp_file_size 512k; 
    proxy_temp_path   /mnt/data/tmp/nginx/proxy_temp; 

    open_file_cache max=1000 inactive=20s; 
    open_file_cache_valid 30s; 
    open_file_cache_min_uses 2; 
    open_file_cache_errors on; 
    } 
}

來源

2013-03-18 fastcatch

在google找到nginx錯誤日誌中的表達式之後，它變成了一個已知問題，與nginx無關，與獨角獸關係不大，並且植根於OS（linux）設置。

問題的核心在於套接字積壓太短。有多種考慮因素應該是多少（無論您是想盡快檢測集羣成員故障還是讓應用程序推動負載限制）。但無論如何，listen:backlog都需要調整。

我發現在我的情況下，listen ... :backlog => 2048就足夠了。（我沒有做太多的嘗試，儘管如果你願意，可以做一個很好的黑客工作，通過兩個套接字在nginx和獨角獸之間進行不同的備份和更長時間的通信;然後在nginx日誌中看到更短的隊列失敗的頻率）請注意，這不是科學計算和YMMV的結果。

但是，請注意，許多OS-es（包括大多數Linux發行版，包括Ubuntu 12.04）在套接字積壓大小（低至128）上具有低得多的操作系統級默認限制。

您可以更改如下（爲根）操作系統限制：

sysctl -w net.core.somaxconn=2048 
sysctl -w net.core.netdev_max_backlog=2048

這些添加到/etc/sysctl.conf以使更改永久。（/etc/sysctl.conf可以在不與sysctl -p重新啓動被重新加載。）

有提到，您可能需要增加可以通過以下方法也可以打開的文件（用於永久性ulimit -n和/etc/security/limits.conf）的最大數量。我已經做了其他的原因，所以我不知道它是否有所作爲。

來源

2013-03-21 09:56:02 fastcatch

非常好！謝謝。 – 2013-05-04 06:23:22

謝謝fastcatch！應用此修復程序..還沒有驗證它是否改善了暫態錯誤網關錯誤。 – amolk 2013-12-18 05:49:26

加載nginx + Unicorn（Rails 3應用程序）時出現錯誤的網關錯誤

回答

相關問題