我在代碼中遇到內存錯誤。我的解析器可以概括這樣的:循環中的空閒內存
# coding=utf-8
#! /usr/bin/env python
import sys
import json
from collections import defaultdict
class MyParserIter(object):
def _parse_line(self, line):
for couple in line.split(","):
key, value = couple.split(':')[0], couple.split(':')[1]
self.__hash[key].append(value)
def __init__(self, line):
# not the real parsing just a example to parse each
# line to a dict-like obj
self.__hash = defaultdict(list)
self._parse_line(line)
def __iter__(self):
return iter(self.__hash.values())
def to_dict(self):
return self.__hash
def __getitem__(self, item):
return self.__hash[item]
def free(self, item):
self.__hash[item] = None
def free_all(self):
for k in self.__hash:
self.free(k)
def to_json(self):
return json.dumps(self.to_dict())
def parse_file(file_path):
list_result = []
with open(file_path) as fin:
for line in fin:
parsed_line_obj = MyParserIter(line)
list_result.append(parsed_line_obj)
return list_result
def write_to_file(list_obj):
with open("out.out", "w") as fout:
for obj in list_obj:
json_out = obj.to_json()
fout.write(json_out + "\n")
obj.free_all()
obj = None
if __name__ == '__main__':
result_list = parse_file('test.in')
print(sys.getsizeof(result_list))
write_to_file(result_list)
print(sys.getsizeof(result_list))
# the same result for memory usage result_list
print(sys.getsizeof([None] * len(result_list)))
# the result is not the same :(
目的是解析(大)文件,每一行轉換爲將要回寫到文件的JSON對象。
我的目標是減少足跡,因爲在某些情況下,此代碼會引發內存錯誤。每個fout.write
後我想刪除(空閒內存)obj
參考。
我試圖將obj
設置爲無法調用方法obj.free_all()
,但它們都沒有釋放內存。我也使用simplejson而不是json,它們減少了佔用空間,但在某些情況下仍然太大。
test.in正在尋找這樣的:
test1:OK,test3:OK,...
test1:OK,test3:OK,...
test1:OK,test3:OK,test4:test_again...
....
你試過gc.collect()了嗎?請參閱:http://stackoverflow.com/questions/1316767/how-can-i-explicitly-free-memory-in-python – JonnyTieM
你的test.in有多大? – YOU
對於真正的解析器,輸入文件大約是300Mb。 –