我有一臺運行Postgres 9.1.15的服務器。該服務器有2GB的RAM並且沒有交換。間歇性Postgres將開始在某些SELECT上出現「內存不足」錯誤,並且將繼續這樣做,直到我重新啓動Postgres 或某些連接到它的客戶端。奇怪的是,當發生這種情況時,free
仍然報告超過500MB的可用內存。Postgres儘管擁有足夠的可用內存,但仍會發生內存不足錯誤
select version();
:
PostgreSQL 9.1.15 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
uname -a
:
Linux db 3.2.0-23-virtual #36-Ubuntu SMP Tue Apr 10 22:29:03 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
postgresql.conf裏(一切被註釋掉/默認):
max_connections = 100
shared_buffers = 500MB
work_mem = 2000kB
maintenance_work_mem = 128MB
wal_buffers = 16MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
random_page_cost = 2.0
effective_cache_size = 1000MB
default_statistics_target = 100
log_temp_files = 0
我從pgtune得到這些值(我選擇了「混合類型的應用程序」),並一直在擺弄wi基於我讀過的內容,而沒有取得太大的進展。目前有68個連接,這是一個典型的數字(我沒有使用pgbouncer或任何其他連接存儲器)。
/etc/sysctl.conf
:
kernel.shmmax=1050451968
kernel.shmall=256458
vm.overcommit_ratio=100
vm.overcommit_memory=2
我首先改變overcommit_memory
至2大約兩週前的OOM殺手殺死了Postgres的服務器後。在此之前,服務器運行良好很長時間。我現在得到的錯誤不那麼災難性,但更煩人,因爲它們更頻繁。
我沒有太多的運氣,查明導致postgres運行「內存不足」的第一個事件 - 它似乎每次都不一樣。墜毀最近時間,前三行記錄的是:
2015-04-07 05:32:39 UTC ERROR: out of memory
2015-04-07 05:32:39 UTC DETAIL: Failed on request of size 125.
2015-04-07 05:32:39 UTC CONTEXT: automatic analyze of table "xxx.public.delayed_jobs"
TopMemoryContext: 68688 total in 10 blocks; 4560 free (4 chunks); 64128 used
[... snipped heaps of lines which I can provide if they are useful ...]
---
2015-04-07 05:33:58 UTC ERROR: out of memory
2015-04-07 05:33:58 UTC DETAIL: Failed on request of size 16.
2015-04-07 05:33:58 UTC STATEMENT: SELECT oid, typname, typelem, typdelim, typinput FROM pg_type
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
TopMemoryContext: 396368 total in 50 blocks; 10160 free (28 chunks); 386208 used
[... snipped heaps of lines which I can provide if they are useful ...]
---
2015-04-07 05:33:59 UTC ERROR: out of memory
2015-04-07 05:33:59 UTC DETAIL: Failed on request of size 1840.
2015-04-07 05:33:59 UTC STATEMENT: SELECT... [nested select with 4 joins, 19 ands, and 2 order bys]
TopMemoryContext: 388176 total in 49 blocks; 17264 free (55 chunks); 370912 used
在此之前的崩潰,幾個小時前,剛剛度過這最後的查詢的三個實例作爲第一個三線崩潰。該查詢經常運行非常,所以我不確定問題是,因爲此查詢的,或者它只是出現在錯誤日誌中,因爲它是一個相當複雜的SELECT始終運行。這就是說,這裏是一個EXPLAIN分析吧:http://explain.depesz.com/s/r00
這是ulimit -a
爲postgres用戶的樣子:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15956
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15956
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
我會試着從free
下一次有一個碰撞獲得確切的數字,在此期間,這是我擁有的所有信息的問題。
任何想法去哪裏從這裏?
AFAIK的內核,如果它分配給承諾的內存來處理,即使他們永遠不會使用它。這包括從postmaster編寫的寫入時複製內存區'fork()'。所以即使內存是空閒的,但目前還沒有被映射到進程的地址空間中,但它已經被提交。我認爲。我可能會在這裏完全揮手。內核對共享內存進行統計的一些怪異問題也有可能存在。 –
'maintenance_work_mem = 128MB'因爲你的內存限制似乎相當大。自動分析啓動時發生問題的事實也表明此參數太大。 –
@a_horse_with_no_name我試着將它設置爲'16MB'。將報告結果。在此期間還有什麼要注意的? –