Python - GAE - 消耗大量內存的腳本循環

我正在使用以下腳本創建一些rss快照（只是說）。Python - GAE - 消耗大量內存的腳本循環

該腳本運行在後端，我有一些非常大的不斷增加的內存消耗。

class StartHandler(webapp2.RequestHandler): 

    @ndb.toplevel 
    def get(self): 
     user_keys = User.query().fetch(1000, keys_only=True) 
     if not user_keys: 
      return 
     logging.info("Starting Process of Users") 
     successful_count = 0 
     start_time = time.time() 
     for user_key in user_keys: 
      try: 
       this_start_time = time.time() 
       statssnapshot = StatsSnapShot(parent=user_key, 
             property=get_rss(user_key.id()) 
             ) 
       #makes a urlfetch 
       statssnapshot.put_async() 
       successful_count += 1    
      except: 
       pass 
     logging.info("".join(("Processed: [", 
          str(successful_count), 
          "] users after [", 
          str(int(time.time()-start_time)), 
          "] secs"))) 
     return

編輯

這裏也是RSS功能可以說：

def get_rss(self, url): 
     try: 
      result = urlfetch.fetch(url) 
      if not result.status_code == 200: 
       logging.warning("Invalid URLfetch") 
       return 
     except urlfetch.Error, e: 
      logging.warning("".join("Fetch Failed to get ",url," with",e)) 
      return 
     content = result.content #Around 500 - 200KB 
     reobj = re.compile(r'(?<=")[0-9]{21}(?=")') 
     user_ids = reobj.findall(content) 
     user_ids = set(user_ids)#set to fail if something is not unique 
     return user_ids

腳本運行正常，但隨着用戶越來越腳本消耗越來越多的內存。來自C我不知道如何操縱Python中的內存和變量高效。

例如，我知道如果python中的變量不再被引用，垃圾回收器將釋放用於該變量的memeory，但那麼似乎是我的情況，我在哪裏做錯了？

如何優化這個腳本不有不斷增加的內存使用，但只消耗內存需要爲每個用戶進程？

來源

2013-01-21 Jimmy Kane

我沒有當場在您的代碼段任何_obvious_內存泄漏，但1 /我有沒有GAE的經驗，2 /有部分代碼沒有提交（特別是「StatsSnapShot」）。只是一對夫婦暗示/是更pythonic： - logging.warning（「」。join（「Fetch獲取失敗」，url，「with」，e））'=>'logging.exception（「Fetch Failed ％s「，url，e）' - 'set（someseq）'不會'失敗，如果不是唯一的' - **永遠不會**使用裸體except子句（至少使用日誌記錄.exception有一些反饋意見） –

@brunodesthuilliers這意味着將失敗是錯字，意味着不復制。 –

use_cache = False按預期工作嗎？ – tesdal

NDB增加了自動緩存，通常非常方便。你有內存緩存和memcached，你可以爲它們設置策略。

當進行投放，您可以提供context options，我懷疑下面會爲你工作：

statssnapshot.put_async(use_cache=False)

來源

2013-01-22 10:45:39 tesdal

非常感謝。這解決了我的問題並節省了大量資源。我也禁用了memcache，這也改善了很多事情。有一點要說的是，在開發服務器上，它仍然可以獲得大量的內存，但在生產過程中效果很好。 –

另外我沒有得到query.map async fetch ... –

開發服務器有memcached模擬，並使用sqllite，而不是外部服務，所以你得到更多的內存使用。關於異步，https://developers.google.com/appengine/docs/python/ndb/async值得一讀。 – tesdal

Python - GAE - 消耗大量內存的腳本循環

回答

相關問題