2013-12-22 40 views
4

我正在使用熊貓做一個環形緩衝區,但內存使用量不斷增加。我究竟做錯了什麼?用熊貓創建緩衝區時發生內存泄漏?

下面是代碼(編輯了一點點距離問題的第一篇文章):

import pandas as pd 
import numpy as np 
import resource 


tempdata = np.zeros((10000,3)) 
tdf = pd.DataFrame(data=tempdata, columns = ['a', 'b', 'c']) 

i = 0 
while True: 
    i += 1 
    littledf = pd.DataFrame(np.random.rand(1000, 3), columns = ['a', 'b', 'c']) 
    tdf = pd.concat([tdf[1000:], littledf], ignore_index = True) 
    del littledf 
    currentmemory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 
    if i% 1000 == 0: 
     print 'total memory:%d kb' % (int(currentmemory)/1000) 

這就是我得到:

total memory:37945 kb 
total memory:38137 kb 
total memory:38137 kb 
total memory:38768 kb 
total memory:38768 kb 
total memory:38776 kb 
total memory:38834 kb 
total memory:38838 kb 
total memory:38838 kb 
total memory:38850 kb 
total memory:38854 kb 
total memory:38871 kb 
total memory:38871 kb 
total memory:38973 kb 
total memory:38977 kb 
total memory:38989 kb 
total memory:38989 kb 
total memory:38989 kb 
total memory:39399 kb 
total memory:39497 kb 
total memory:39587 kb 
total memory:39587 kb 
total memory:39591 kb 
total memory:39604 kb 
total memory:39604 kb 
total memory:39608 kb 
total memory:39608 kb 
total memory:39608 kb 
total memory:39608 kb 
total memory:39608 kb 
total memory:39608 kb 
total memory:39612 kb 

不知道它是否與此有關:

https://github.com/pydata/pandas/issues/2659

在帶有蟒蛇Python的MacBook Air上進行測試

+0

奇怪的是,我複製並粘貼此代碼並且沒有泄漏。 0.12和0.13rc。 –

+0

我添加了我得到的內容(並稍微更改了一些代碼)。你有相同還是不同? – Fra

+0

我得到「總內存:59 kb」一路下降。也許操作系統/設置,可能會添加更多的細節:s。雖然可以更好地作爲sep github問題。你有沒有嘗試像其他問題一樣添加gc.collect? –

回答

0

而不是使用concat,爲什麼不更新DataFrame到位? i % 10將決定您爲每個更新寫入哪個1000行的插槽。

i = 0 
while True: 
    i += 1 
    tdf.iloc[1000*(i % 10):1000+1000*(i % 10)] = np.random.rand(1000, 3) 
    currentmemory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 
    if i% 1000 == 0: 
     print 'total memory:%d kb' % (int(currentmemory)/1000)