2017-10-29 83 views
0

嗨我想要生成從1000000到2000000數字列表,但問題是,我得到一個錯誤記憶錯誤,我正在使用隨機一切都很好只有我得到dublcated號碼,我不能重複數字,所以我不能切換到XRANGE生成號碼列表

data = [] 
total = 2000000 
def resource_file(info): 
    with open(info, "r") as data_file: 
     reader = csv_reader(data_file, delimiter=",") 
     for row in reader: 
      try: 
       for i in xrange(1000000,total): 
        new_row = [row[0], row[1], i] 
        data.append(new_row) 
      except IndexError as error: 
       print(error) 
    with open(work_dir + "new_data.csv", "w") as new_data: 
     writer = csv_writer(new_data, delimiter=",") 
     for new_row in data: 
      writer.writerow(new_row) 
+1

您正在嘗試將所有內容存儲在內存中,然後再寫出任何內容。您可以通過一次只處理一行來使用較少的內存,而不是嘗試將整個文件存儲在內存中。 –

+0

您確定要創建比輸入CSV文件多1000000倍的元素嗎?期望的結果是什麼?你能給出一個小例子的CSV文件,你期望得到的CSV文件看起來像什麼? – trincot

+0

我想爲行號2 – Mike

回答

3

重複使用一個額外的列範圍1M..2M

問題的每一行是你第一次所有這些配置存儲在內存中。 Python的第一個沒有一個非常有效的內存模型,而且每行還有一百萬個條目非常大。

我建議不要保存在一個列表中的數據,而只是寫這些,立即文件:要採取線

total = 2000000 
def resource_file(info): 
    with open(info, "r") as data_file: 
     reader = csv_reader(data_file, delimiter=",") 
     with open(work_dir + "new_data.csv", "w") as new_data: 
      writer = csv_writer(new_data, delimiter=",") 
      for row in reader: 
       rowa, rowb = row[0:2] 
       for data in xrange(1000000,total): 
        writer.writerow([rowa,rowb,data]) 

採取行1M-2M的文件

萬一1M到原始文件的2M,你可以寫爲:

from itertools import islice 

total = 2000000 
def resource_file(info): 
    with open(info, "r") as data_file: 
     reader = csv_reader(data_file, delimiter=",") 
     with open(work_dir + "new_data.csv", "w") as new_data: 
      writer = csv_writer(new_data, delimiter=",") 
      for row in islice(reader,1000000,total): 
       writer.writerow(row) 

,或者你可以把它簡化,像@JonClemens說,有:

from itertools import islice 

total = 2000000 
def resource_file(info): 
    with open(info, "r") as data_file: 
     reader = csv_reader(data_file, delimiter=",") 
     with open(work_dir + "new_data.csv", "w") as new_data: 
      writer = csv_writer(new_data, delimiter=",") 
      writer.writerows(islice(reader,1000000,total))
+0

的csv文件中的每一行添加一個數字,但我認爲rowa,rowb會在循環右邊 – Mike

+0

@Mike:不,它在* read *循環中。 –

+0

是的,但信息文件中的每一行都將循環=總數2000000 – Mike