2011-05-21 106 views
1

我有什麼定期刷新使用這個腳本的頁面:內存泄漏而循環web.client.getPage功能

from twisted.web.client import getPage 
from twisted.internet import reactor, task 

def getData(): 
    dgp = getPage('http://www.google.com/') 
    dgp.addCallback(dataLoadOK) 
    dgp.addErrback(dataLoadError) 

def dataLoadOK(value): 
    print value 

def dataLoadError(error): 
    print error 

loop = task.LoopingCall(getData) 
loop.start(10, now=True) 
reactor.run() 

購買,而使用這種方式,我得到了內存泄漏。有沒有人幫我找到它?

編輯: 我已經嘗試使用garbage collection python module,並得到了這一點的說:

GARBAGE OBJECTS: 
:: <HTTPClientFactory: http://www.google.com/> 
     type: <type 'instance'> 
referrers: 3 
    is class: True 
    module: <module 'twisted.web.client' from '/usr/lib/python2.7/site-packages/twisted/web/client.pyc'> 

:: {'status': '200', 'cookies': {'PREF': 'ID=d894e510f2ebe263:FF=0:TM=1306053252:LM=1306053252:S=ebpb4ZebRUu_EhiI', 'NID': '47=LxM9fbBBN-bVIeuLPOfvO-fgXOKw1n2suyZ2... 
     type: <type 'dict'> 
referrers: 3 
    is class: True 
    module: None 

:: InsensitiveDict({}) 
     type: <type 'instance'> 
referrers: 3 
    is class: True 
    module: <module 'twisted.python.util' from '/usr/lib/python2.7/site-packages/twisted/python/util.pyc'> 

:: {'preserve': 1, 'data': {}} 
     type: <type 'dict'> 
referrers: 3 
    is class: True 
    module: None 

:: <Deferred at 0x29e2cf8 current result: None> 
     type: <type 'instance'> 
referrers: 3 
    is class: True 
    module: <module 'twisted.internet.defer' from '/usr/lib/python2.7/site-packages/twisted/internet/defer.pyc'> 

:: {'_chainedTo': None, 'called': True, '_canceller': None, 'callbacks': [], 'result': None, '_runningCallbacks': False} 
     type: <type 'dict'> 
referrers: 3 
    is class: True 
    module: None 

:: <<class 'twisted.internet.tcp.Client'> to ('www.google.com', 80) at 2445090> 
     type: <class 'twisted.internet.tcp.Client'> 
referrers: 3 
    is class: True 
    module: <module 'twisted.internet.tcp' from '/usr/lib/python2.7/site-packages/twisted/internet/tcp.pyc'> 
    line num: 681 
     line: class Client(BaseClient): 
     line:  """A TCP client.""" 
     line: 
     line:  def __init__(self, host, port, bindAddress, connector, reactor=None): 
     line:   # BaseClient.__init__ is invoked later 
     line:   self.connector = connector 
     line:   self.addr = (host, port) 
     line: 
     line:   whenDone = self.resolveAddress 
     line:   err = None 
     line:   skt = None 
     line: 
     line:   try: 
     line:    skt = self.createInternetSocket() 
     line:   except socket.error, se: 
     line:    err = error.ConnectBindError(se[0], se[1]) 
     line:    whenDone = None 
     line:   if whenDone and bindAddress is not None: 
     line:    try: 
     line:     skt.bind(bindAddress) 
     line:    except socket.error, se: 
     line:     err = error.ConnectBindError(se[0], se[1]) 
     line:     whenDone = None 
     line:   self._finishInit(whenDone, skt, err, reactor) 
     line: 
     line:  def getHost(self): 
     line:   """Returns an IPv4Address. 
     line: 
     line:   This indicates the address from which I am connecting. 
     line:   """ 
     line:   return address.IPv4Address('TCP', *(self.socket.getsockname() + ('INET',))) 
     line: 
     line:  def getPeer(self): 
     line:   """Returns an IPv4Address. 
     line: 
     line:   This indicates the address that I am connected to. 
     line:   """ 
     line:   return address.IPv4Address('TCP', *(self.realAddress + ('INET',))) 
     line: 
     line:  def __repr__(self): 
     line:   s = '<%s to %s at %x>' % (self.__class__, self.addr, unsignedID(self)) 
     line:   return s 

:: {'_tempDataBuffer': [], 'disconnected': 1, 'dataBuffer': '', '_tempDataLen': 0, 'realAddress': ('74.125.225.81', 80), 'connector': <twisted.internet.tcp.Connect... 
     type: <type 'dict'> 
referrers: 3 
    is class: True 
    module: None 

:: [] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: {'x-xss-protection': ['1; mode=block'], 'set-cookie': ['PREF=ID=d894e510f2ebe263:FF=0:TM=1306053252:LM=1306053252:S=ebpb4ZebRUu_EhiI; expires=Tue, 21-May-2013 0... 
     type: <type 'dict'> 
referrers: 3 
    is class: True 
    module: None 

:: ['-1'] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: ['private, max-age=0'] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: ['text/html; charset=ISO-8859-1'] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: ['PREF=ID=d894e510f2ebe263:FF=0:TM=1306053252:LM=1306053252:S=ebpb4ZebRUu_EhiI; expires=Tue, 21-May-2013 08:34:12 GMT; path=/; domain=.google.com', 'NID=47=LxM9... 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: ['gws'] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: ['1; mode=block'] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: [] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: <twisted.internet.tcp.Connector instance at 0x29e2cb0> 
     type: <type 'instance'> 
referrers: 3 
    is class: True 
    module: <module 'twisted.internet.tcp' from '/usr/lib/python2.7/site-packages/twisted/internet/tcp.pyc'> 

:: ['Sun, 22 May 2011 08:34:12 GMT'] 
     type: <type 'list'> 
referrers: 3 
    is class: True 
    module: None 

:: {'reactor': <twisted.internet.selectreactor.SelectReactor object at 0x288bd10>, 'state': 'disconnected', 'factoryStarted': 0, 'bindAddress': None, 'factory': <H... 
     type: <type 'dict'> 
referrers: 3 
    is class: True 
    module: None 

所以我看到扭曲的功能內的一些未關閉的參考,我怎麼能避免呢?

回答

3

嘗試在related questions推薦的一些策略。但是,很可能您沒有內存泄漏,您只有memory fragmentation

它看起來像「Python內存泄漏檢測器」有一個非常嚴重的錯誤。它啓用DEBUG_LEAK,其中防止收集所有周期。換句話說,它創造了大量的大量泄漏。如果您只是在示例中添加一些代碼來報告gc.garbage的內容而未啓用DEBUG_LEAK,則它將保持爲空(即使沒有啓用任何gc調試標誌,如果有任何對象實際上正在泄漏,將會填充gc.garbage)。

+0

只是更新我的帖子結果狩獵泄漏,垃圾對象每次增加getData()運行 – BGE 2011-05-22 08:50:45

+0

更新了答案,談論「Python內存泄漏檢測器」的缺陷。 – 2011-05-22 13:41:46

2

您安排循環呼叫的方式可能是一個問題。您不會從getData返回Deferred,因此通話可能會累積。

如果檢索您的網頁花費的時間超過10秒,則會在第二個getData完成之前調用第二個getData。如果你使用的是一個試圖扼殺你的網站(並且google.com肯定會這樣做),那麼越多的請求堆積起來,它就會越耽誤你。每次嘗試都會佔用一些內存,這可能看起來像是泄漏。

如果是這樣的問題(雖然你應該使用讓 - 保羅暗示發現,如果這是實際上問題的技術),那麼你可以通過添加「return dgp」你getData函數的最後解決。

+0

實際上在生產腳本中,間隔是300秒,比任何超時多,我檢查預調用getData()調用完成,爲了更好的閱讀,此腳本被簡化 – BGE 2011-05-22 08:53:58