2014-01-18 84 views
0

我有大約10000個文件,包含大量的數據。mac上的python內存錯誤

我試圖爲所有文件和每個文件中的一些數據建立一個python字典。

我所做的是類似的東西:

results = {} 
for bfile in os.listdir(files_dir): 
     fname, ext = os.path.splitext(bfile) 

     fhandle = open(os.path.join(files_dir,bfile), 'r') 
     if not results.has_key(fname): 
       results[fname] = {} 
     for line in fhandle: 
      line = line.split("\t") 


      if not results[fname].has_key(line[0]): 
       results[fname][line[0]] = {} 

      if not results[fname][line[0]].has_key(line[1]): 
       results[fname][line[0]][line[1]] = {} 

這本來是一個簡單的任務,但我得到這個錯誤:

File "script.py", line 409, in <module> 
    file_handle() 
    File "script.py", line 247, in file_handle 
    results[fname][line[0]][line[1]] = {} 
MemoryError 
Error in sys.excepthook: 
Traceback (most recent call last): 
    File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 66, in apport_excepthook 
    from apport.fileutils import likely_packaged, get_recent_crashes 
    File "/usr/lib/python2.7/dist-packages/apport/__init__.py", line 1, in <module> 
    from apport.report import Report 
    File "/usr/lib/python2.7/dist-packages/apport/report.py", line 18, in <module> 
    import problem_report 
    File "/usr/lib/python2.7/dist-packages/problem_report.py", line 14, in <module> 
    import zlib, base64, time, sys, gzip, struct, os 
    File "/usr/lib/python2.7/gzip.py", line 10, in <module> 
    import io 
    File "/usr/lib/python2.7/io.py", line 60, in <module> 
    import _io 
MemoryError 

Original exception was: 
Traceback (most recent call last): 
    File "script.py", line 409, in <module> 
    file_handle() 
    File "script.py", line 247, in file_handle 
    results[fname][line[0]][line[1]] = {} 
MemoryError 
Segmentation fault (core dumped) 
+0

多少物理內存你有,以及有多大你正在處理的文件? – senshin

回答

0

看來你永遠不會關閉文件完成後。這可能是問題,那麼請嘗試以下操作:

with open(os.path.join(files_dir, bfile), 'r') as fhandle: 
    if not results.has_key(fname): 
     results[fname] = {} 
    for line in fhandle: 
     line = line.split("\t") 

     if not results[fname].has_key(line[0]): 
      results[fname][line[0]] = {} 

     if not results[fname][line[0]].has_key(line[1]): 
      results[fname][line[0]][line[1]] = {}