循環中的空閒內存

我在代碼中遇到內存錯誤。我的解析器可以概括這樣的：循環中的空閒內存

# coding=utf-8 
#! /usr/bin/env python 
import sys 
import json 
from collections import defaultdict 


class MyParserIter(object): 

    def _parse_line(self, line): 
     for couple in line.split(","): 
      key, value = couple.split(':')[0], couple.split(':')[1] 
      self.__hash[key].append(value) 

    def __init__(self, line): 
     # not the real parsing just a example to parse each 
     # line to a dict-like obj 
     self.__hash = defaultdict(list) 
     self._parse_line(line) 

    def __iter__(self): 
     return iter(self.__hash.values()) 

    def to_dict(self): 
     return self.__hash 

    def __getitem__(self, item): 
     return self.__hash[item] 

    def free(self, item): 
     self.__hash[item] = None 

    def free_all(self): 
     for k in self.__hash: 
      self.free(k) 

    def to_json(self): 
     return json.dumps(self.to_dict()) 


def parse_file(file_path): 
    list_result = [] 
    with open(file_path) as fin: 
     for line in fin: 
      parsed_line_obj = MyParserIter(line) 
      list_result.append(parsed_line_obj) 
    return list_result 


def write_to_file(list_obj): 
    with open("out.out", "w") as fout: 
     for obj in list_obj: 
      json_out = obj.to_json() 
      fout.write(json_out + "\n") 
      obj.free_all() 
      obj = None 

if __name__ == '__main__': 
     result_list = parse_file('test.in') 
     print(sys.getsizeof(result_list)) 
     write_to_file(result_list) 
     print(sys.getsizeof(result_list)) 
     # the same result for memory usage result_list 
     print(sys.getsizeof([None] * len(result_list))) 
     # the result is not the same :(

目的是解析（大）文件，每一行轉換爲將要回寫到文件的JSON對象。

我的目標是減少足跡，因爲在某些情況下，此代碼會引發內存錯誤。每個fout.write後我想刪除（空閒內存）obj參考。

我試圖將obj設置爲無法調用方法obj.free_all()，但它們都沒有釋放內存。我也使用simplejson而不是json，它們減少了佔用空間，但在某些情況下仍然太大。

test.in正在尋找這樣的：

test1:OK,test3:OK,... 
test1:OK,test3:OK,... 
test1:OK,test3:OK,test4:test_again... 
....

來源

2016-02-17 Ali SAID OMAR

你試過gc.collect（）了嗎？請參閱：http://stackoverflow.com/questions/1316767/how-can-i-explicitly-free-memory-in-python – JonnyTieM

你的test.in有多大？ – YOU

對於真正的解析器，輸入文件大約是300Mb。 –

爲了obj是自由能，對它的所有引用必須被淘汰。你的循環沒有這樣做，因爲list_obj中的參考依然存在。下面將解決這個問題：

def write_to_file(list_obj): 
    with open("out.out", "w") as fout: 
     for ix in range(list_obj): 
      obj = list_obj[ix] 
      list_obj[ix] = None 
      json_out = obj.to_json() 
      fout.write(json_out + "\n") 
      obj.free_all()

或者，您可以破壞性從彈出的list_obj前面的元素，儘管這可能導致性能問題，如果它必須重新分配list_obj太多次。我沒有嘗試過這個，所以我不太確定。該版本看起來像這樣：

def write_to_file(list_obj): 
    with open("out.out", "w") as fout: 
     while len(list_obj) > 0: 
      obj = list_obj.pop(0) 
      json_out = obj.to_json() 
      fout.write(json_out + "\n") 
      obj.free_all()

來源

2016-02-26 08:05:11

謝謝。根據sys.getsizeof，此代碼的測試不釋放內存。 –

您是否在刪除引用後嘗試調用'gc.collect（）'？ –

是的，我做了，進程的時間已經崩潰 –

不要存放在數組類的許多實例，而不是做內聯。例。

% cat test.in 
test1:OK,test3:OK 
test1:OK,test3:OK 
test1:OK,test3:OK,test4:test_again 

% cat test.py 
import json 

with open("test.in", "rb") as src: 
    with open("out.out", "wb") as dst: 
     for line in src: 
      pairs, obj = [x.split(":",1) for x in line.rstrip().split(",")], {} 
      for k,v in pairs: 
       if k not in obj: obj[k] = [] 
       obj[k].append(v) 
      dst.write(json.dumps(obj)+"\n") 

% cat out.out 
{"test1": ["OK"], "test3": ["OK"]} 
{"test1": ["OK"], "test3": ["OK"]} 
{"test1": ["OK"], "test3": ["OK"], "test4": ["test_again"]}

如果它是緩慢的，不寫入文件一行行，但店陣列傾倒JSON字符串，並做dst.write("\n".join(array))

來源

2016-02-17 13:28:47 YOU

謝謝，但它意味着重構我的課堂和混合邏輯（輸出到文件和解析輸入）。解析器應該能夠輸出文件或控制檯，爲什麼我需要存儲解析結果。能夠啓動或獲得鑰匙。最後，單位測試這種方法不適合我。 –

@AliSAIDOMAR這很簡單。只需編寫一個生成器來「產生」該值（基本上將該答案中的代碼放入「def」中，並用「yield」替換掉'dst.write'）。然後，您可以迭代結果並寫入任何您想要的內容。 – Bakuriu

這裏的主要觀點是不存儲整個結果（以任何一種形式），而是逐行讀取/解析/寫入。你的原始程序的工作方式，內存消耗是O（file_length），而對於你的方法（逐行讀/解析/寫），內存消耗是O（max_line_length）。 – hvb

循環中的空閒內存

回答

相關問題