2016-11-07 127 views
1

我寫了一個代碼,用於添加兩個不同文本文件中的數字。對於非常大的數據2-3 GB,我得到了MemoryError。所以,我正在使用一些函數編寫一個新代碼,以避免將整個數據加載到內存中。功能不能正確返回列表

這段代碼打開一個輸入文件「d.txt」的讀出更大的數據的某些行之後的數字如下:

SCALAR 
ND 3 
ST 0 
TS 1000 
1.0 
1.0 
1.0 
SCALAR 
ND 3 
ST 0 
TS 2000 
3.3 
3.4 
3.5 
SCALAR 
ND 3 
ST 0 
TS 3000 
1.7 
1.8 
1.9 

並增加了數從較小的文本文件已閱讀的「e .TXT」如下:

SCALAR 
ND 3 
ST 0 
TS 0 
10.0 
10.0 
10.0 

的結果寫入一個文本文件 'output.txt的' 這樣的:

SCALAR 
ND 3 
ST 0 
TS 1000 
11.0 
11.0 
11.0 
SCALAR 
ND 3 
ST 0 
TS 2000 
13.3 
13.4 
13.5 
SCALAR 
ND 3 
ST 0 
TS 3000 
11.7 
11.8 
11.9 

,我編寫的代碼:

def add_list_same(list1, list2): 
    """ 
    list2 has the same size as list1 
    """ 
    c = [a+b for a, b in zip(list1, list2)] 
    print(c) 
    return c 


def list_numbers_after_ts(n, f): 
    result = [] 
    for line in f: 
     if line.startswith('TS'): 
      for node in range(n): 
       result.append(float(next(f))) 
    return result 


def writing_TS(f1): 
    TS = [] 
    ND = [] 
    for line1 in f1: 
     if line1.startswith('ND'): 
      ND = float(line1.split()[-1]) 
     if line1.startswith('TS'): 
      x = float(line1.split()[-1]) 
      TS.append(x) 
    return TS, ND 


with open('d.txt') as depth_dat_file, \ 
    open('e.txt') as elev_file, \ 
    open('output.txt', 'w') as out: 
    m = writing_TS(depth_dat_file) 
    print('number of TS', m[1]) 
    for j in range(0,int(m[1])-1): 
     i = m[1]*j 
     out.write('SCALAR\nND {0:2f}\nST 0\nTS {0:2f}\n'.format(m[1], m[0][j])) 
     list1 = list_numbers_after_ts(int(m[1]), depth_dat_file) 
     list2 = list_numbers_after_ts(int(m[1]), elev_file) 
     Eh = add_list_same(list1, list2) 
     out.writelines(["%.2f\n" % item for item in Eh]) 

的output.txt的是這樣的:

SCALAR 
ND 3.000000 
ST 0 
TS 3.000000 
SCALAR 
ND 3.000000 
ST 0 
TS 3.000000 
SCALAR 
ND 3.000000 
ST 0 
TS 3.000000 

添加列表不工作,除了我單獨檢查的功能,他們的工作。我沒有發現錯誤。我改變了很多,但它不起作用。任何建議?我非常感謝您提供的任何幫助!

回答

1

您可以使用grouper通過固定行數讀取文件。如果組中的行順序保持不變,則下一個代碼應該可以工作。

from itertools import zip_longest 

#Split by group iterator 
#See http://stackoverflow.com/questions/434287/what-is-the-most-pythonic-way-to-iterate-over-a-list-in-chunks 
def grouper(iterable, n, padvalue=None): 
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue) 

add_numbers = [] 

with open("e.txt") as f: 
    # Read data by 7 lines 
    for lines in grouper(f, 7): 
     # Suppress first SCALAR line 
     for line in lines[1:]: 
      # add last number in every line to array (6 elements) 
      add_numbers.append(float(line.split()[-1].strip())) 

#template for every group 
template = 'SCALAR\nND {:.2f}\nST {:.2f}\nTS {:.2f}\n{:.2f}\n{:.2f}\n{:.2f}\n' 

with open("d.txt") as f, open('output.txt', 'w') as out: 
    # As before 
    for lines in grouper(f, 7): 
     data_numbers = [] 
     for line in lines[1:]: 
      data_numbers.append(float(line.split()[-1].strip())) 
     # in result_numbers sum elements of two arrays by pair (6 elements) 
     result_numbers = [x + y for x, y in zip(data_numbers, add_numbers)] 
     # * unpack result_numbers as 6 arguments of function format 
     out.write(template.format(*result_numbers)) 
0

我不得不改變代碼中的一些小東西,現在,它的作品,但只是很小的輸入文件,因爲很多變量被加載到內存中。你能告訴我,我怎樣才能以良率工作?

from itertools import zip_longest 

def grouper(iterable, n, padvalue=None): 
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue) 


def writing_ND(f1): 
    for line1 in f1: 
     if line1.startswith('ND'): 
      ND = float(line1.split()[-1]) 
      return ND 


def writing_TS(f): 
    for line2 in f: 
     if line2.startswith('TS'): 
      x = float(line2.split()[-1]) 
      TS.append(x) 
    return TS 
TS = [] 
ND = [] 
x = 0.0 
n = 0 
add_numbers = [] 

with open("e.txt") as f, open("d.txt") as f1,\ 
    open('output.txt', 'w') as out: 
    ND = writing_ND(f) 
    TS = writing_TS(f1) 
    n = int(ND)+4 
    f.seek(0) 
    for lines in grouper(f, int(n)): 
     for item in lines[4:]: 
      add_numbers.append(float(item)) 
    i = 0 
    for l in grouper(f1, n): 
     data_numbers = [] 
     for line in l[4:]: 
      data_numbers.append(float(line.split()[-1].strip())) 
      result_numbers = [x + y for x, y in zip(data_numbers, add_numbers)] 
     del data_numbers 
     out.write('SCALAR\nND %d\nST 0\nTS  %0.2f\n' % (ND, TS[i])) 
     i += 1 
     for item in result_numbers: 
      out.write('%s\n' % item)