2017-07-04 53 views
0

我正在嘗試做以下事情,但需要很長時間。 可有人請建議做這個簡化大數據處理腳本

f = open('answer.csv','w') 
f.write('Datetime,0: Vm,0: Va,1: Vm,1: Va,2: Vm,2: Va,3: Vm,3: Va,4: Vm,4: Va,5: Vm,5: Va,6: Vm,6: Va,7: Vm,7: Va,8: Vm,8: Va,9: Vm,9: Va,10: Vm,10: Va,11: Vm,11: Va,12: Vm,12: Va,13: Vm,13: Va\n') 
# 'n' is around 8000000 
# 'PQ_data' is a pandas DataFrame with more than n rows 
# 'class' is a python class object with some functions in it 
for i in range(n): 
    p = [] 
    q = [] 
    for j in range(1,14): 
     if j<=10: 
      p.append(PQ_data['{} P'.format(j)][i]) 
      q.append(PQ_data['{} Q'.format(j)][i]) 
     else: 
      p.append(0) 
      q.append(0) 

    class.do_something(p,q) 
    vm = class.get_Vm().tolist() 
    va = class.get_Va().tolist() 
    # above methods return 14 length lists. 
    # PQ_data.index has datetime values 
    f.write('{}'.format(PQ_data.index[i])) 
    for j in range(len(vm)): 
     f.write(',{},{}'.format(vm[j],va[j])) 
    f.write('\n') 
f.close() 

回答

0

試試這一個更快的方式。如果沒有,你可能需要拋出多處理

import csv 
import itertools 

with open('answer.csv','w') as fout: 
    outfile = csv.writer(fout) 
    outfile.writerow(['Datetime', '0: Vm', '0: Va', '1: Vm', '1: Va', '2: Vm', '2: Va', '3: Vm', '3: Va', '4: Vm', '4: Va', '5: Vm', '5: Va', '6: Vm', '6: Va', '7: Vm', '7: Va', '8: Vm', '8: Va', '9: Vm', '9: Va', '10: Vm', '10: Va', '11: Vm', '11: Va', '12: Vm', '12: Va', '13: Vm', '13: Va']) 

    for i in range(n): 
     p = [PQ_data['{} P'.format(j)][i] for j in range(1,11)] + [0]*3 
     q = [PQ_data['{} Q'.format(j)][i] for j in range(1,11)] + [0]*3 

     class.do_something(p,q) 
     vm = class.get_Vm().tolist() 
     va = class.get_Va().tolist() 

     row = itertools.chain([PQ_data.index[i]], itertools.chain.from_iterable((vm[j],va[j]) for j in range(len(vm)))) 
     outfile.writerow(row) 
+0

謝謝! @ inspectorG4dget這比我的代碼更好,但仍需要很多時間。 可能是由於函數do_something本身花費的時間本身 –

+0

@code_dragon:很可能。如果你用'do_something'的定義創建一個新帖子,我們可能會優化 – inspectorG4dget

+0

,但do_something不是簡單的@ inspectorG4dget。類使用一些API來做一些計算並返回必要的輸出。 –