2013-09-23 93 views
5

我正在嘗試使用C/Cython擴展和multiprocessing來查找Python/NumPy程序中令人討厭的內存泄漏的起源。調試Python/NumPy內存泄漏

每個子進程處理一張圖像列表,每個子進程通過Queue向主進程發送輸出數組(通常大約200-300MB)。相當標準的地圖/縮小設置。

正如你可以想象的內存泄漏可以採取巨大的比例與這個巨大的陣列,並有多個進程愉快地超過20GB內存時,他們只需要5-6GB是...煩人。

  • 我已經嘗試通過Valgrind運行Python的調試版本,並對內存泄漏的擴展進行了四重檢查,但沒有發現任何內容。

  • 我檢查了我的Python代碼中對數組的懸掛引用,並且還使用NumPy的allocation tracker來檢查我的數組是否確實被釋放。他們是。

我做的最後一件事是在GDB連接到我的過程中的一個(這個壞男孩現在在27GB RAM和計數運行),傾倒堆到磁盤的很大一部分。讓我驚訝的是,傾銷文件充滿了零!大約7G的零值。

這是Python/NumPy中的標準內存分配行爲嗎?我是否錯過了一些顯而易見的東西,這可以解釋爲什我如何正確管理內存?


編輯:爲了記錄在案,我正在與NumPy 1.7.1和Python 2.7.3。

編輯2:我一直在監測過程與strace,並且它似乎不斷增加的每個處理(使用系統調用brk())的破發點。

CPython實際上是否正確釋放內存? C擴展,NumPy數組呢?誰決定什麼時候調用brk(),它是Python本身還是它的底層庫(libc,...)?

以下是帶一個迭代(即一個輸入圖像集)的帶有註釋的樣本strace日誌。請注意,中斷點不斷增加,但我確信(使用objgraph)Python解釋器中沒有有意義的NumPy數組。

# Reading .inf files with metadata 
# Pretty small, no brk() 
open("1_tif_all/AIR00642_1.inf", O_RDONLY) = 6 
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9387fff000 
munmap(0x7f9387fff000, 4096)   = 0 
open("1_tif_all/AIR00642_2.inf", O_RDONLY) = 6 
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9387fff000 
munmap(0x7f9387fff000, 4096)   = 0 
open("1_tif_all/AIR00642_3.inf", O_RDONLY) = 6 
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9387fff000 
munmap(0x7f9387fff000, 4096)   = 0 
open("1_tif_all/AIR00642_4.inf", O_RDONLY) = 6 
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9387fff000 
munmap(0x7f9387fff000, 4096)   = 0 

# This is where I'm starting the heavy processing 
write(2, "[INFO/MapProcess-1] Shot 642: Da"..., 68) = 68 
write(2, "[INFO/MapProcess-1] Shot 642: Vi"..., 103) = 103 
write(2, "[INFO/MapProcess-1] Shot 642: Re"..., 66) = 66 

# I'm opening a .tif image (752 x 480, 8-bit, 1 channel) 
open("1_tif_all/AIR00642_3.tif", O_RDONLY) = 6 
read(6, "II*\0JC\4\0", 8)    = 8 
mmap(NULL, 279600, PROT_READ, MAP_SHARED, 6, 0) = 0x7f9387fbb000 
munmap(0x7f9387fbb000, 279600)   = 0 
write(2, "[INFO/MapProcess-1] Shot 642: Pr"..., 53) = 53 

# Another .tif 
open("1_tif_all/AIR00642_4.tif", O_RDONLY) = 6 
read(6, "II*\0\266\374\3\0", 8)   = 8 
mmap(NULL, 261532, PROT_READ, MAP_SHARED, 6, 0) = 0x7f9387fc0000 
munmap(0x7f9387fc0000, 261532)   = 0 
write(2, "[INFO/MapProcess-1] Shot 642: Pr"..., 51) = 51 
brk(0x1aea97000)      = 0x1aea97000 

# Another .tif 
open("1_tif_all/AIR00642_1.tif", O_RDONLY) = 6 
read(6, "II*\0\220\253\4\0", 8)   = 8 
mmap(NULL, 306294, PROT_READ, MAP_SHARED, 6, 0) = 0x7f9387fb5000 
munmap(0x7f9387fb5000, 306294)   = 0 
brk(0x1af309000)      = 0x1af309000 
write(2, "[INFO/MapProcess-1] Shot 642: Pr"..., 53) = 53 
brk(0x1b03da000)      = 0x1b03da000 

# Another .tif 
open("1_tif_all/AIR00642_2.tif", O_RDONLY) = 6 
mmap(NULL, 345726, PROT_READ, MAP_SHARED, 6, 0) = 0x7f9387fab000 
munmap(0x7f9387fab000, 345726)   = 0 
brk(0x1b0c42000)      = 0x1b0c42000 
write(2, "[INFO/MapProcess-1] Shot 642: Pr"..., 51) = 51 

# I'm done reading my images 
write(2, "[INFO/MapProcess-1] Shot 642: Fi"..., 72) = 72 

# Allocating some more arrays for additional variables 
# Increases by about 8M at a time 
brk(0x1b1453000)      = 0x1b1453000 
brk(0x1b1c63000)      = 0x1b1c63000 
brk(0x1b2473000)      = 0x1b2473000 
brk(0x1b2c84000)      = 0x1b2c84000 
brk(0x1b3494000)      = 0x1b3494000 
brk(0x1b3ca5000)      = 0x1b3ca5000 

# What are these mmap calls doing here? 
mmap(NULL, 270594048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9377df1000 
mmap(NULL, 270594048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9367be2000 
mmap(NULL, 270594048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f93579d3000 
mmap(NULL, 270594048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f93477c4000 
mmap(NULL, 270594048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f93375b5000 
munmap(0x7f93579d3000, 270594048)  = 0 
munmap(0x7f93477c4000, 270594048)  = 0 
mmap(NULL, 270594048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f93579d3000 
munmap(0x7f93375b5000, 270594048)  = 0 
mmap(NULL, 50737152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9354970000 
munmap(0x7f9354970000, 50737152)  = 0 
brk(0x1b4cc6000)      = 0x1b4cc6000 
brk(0x1b5ce7000)      = 0x1b5ce7000 

EDIT 3:Is freeing handled differently for small/large numpy arrays?可能是相關的。我越來越確信,我只是分配了太多不能釋放到系統中的數組,因爲它實際上是標準行爲。將嘗試事先分配我的數組,並根據需要重用它們。

+0

你用什麼來讀取圖像文件?我在過去使用PIL'Image'對象時遇到了內存泄漏問題 –

+0

我正在使用PyLibTiff綁定。我解決了這個問題,看到我的答案! –

回答

1

Doh。我確實應該第五次檢查那些C擴展。

我忘記了從C分配的臨時NumPy數組中的一個臨時NumPy數組中的引用計數。數組沒有離開C代碼,所以我沒有看到我需要將其釋放。

我還不知道爲什麼它沒有出現在objgraph