2016-11-29 125 views
1

我有這個csv文件:如何分割csv文件的條件?

89,Network activity,ip-dst,80.179.42.44,,1,20160929 
89,Payload delivery,md5,4ad2924ced722ab65ff978f83a40448e,,1,20160929 
89,Network activity,domain,alkamaihd.net,,1,20160929 
90,Payload delivery,md5,197c018922237828683783654d3c632a,,1,20160929 
90,Network activity,domain,dnsrecordsolver.tk,,1,20160929 
90,Network activity,ip-dst,178.33.94.47,,1,20160929 
90,Payload delivery,filename,Airline.xls,,1,20160929 
91,Payload delivery,md5,23a9bbf8d64ae893db17777bedccdc05,,1,20160929 
91,Payload delivery,md5,07e47f06c5ed05a062e674f8d11b01d8,,1,20160929 
91,Payload delivery,md5,bd75af219f417413a4e0fae8cd89febd,,1,20160929 
91,Payload delivery,md5,9f4023f2aefc8c4c261bfdd4bd911952,,1,20160929 
91,Network activity,domain,mailsinfo.net,,1,20160929 
91,Payload delivery,md5,1e4653631feebf507faeb9406664792f,,1,20160929 
92,Payload delivery,md5,6fa869f17b703a1282b8f386d0d87bd4,,1,20160929 
92,Payload delivery,md5,24befa319fd96dea587f82eb945f5d2a,,1,20160929 

我需要這個CSV文件分割到4個CSV文件,其中的條件是在每一行開頭的事件編號。到目前爲止,我創建了一個包含事件編號{89,90,91,92}的集合,並且我知道我需要在循環中進行循環,並將每一行復制到其專用的csv文件中。

+0

看一看這個類似的問題:http://stackoverflow.com/questions/40789383/python-split-csv-file-according-第一列字符/ 40790237#40790237 – chthonicdaemon

回答

0

這將是最好不要硬編碼的事件號碼你的代碼,所以它不依賴於數據的值。我還傾向於使用經過優化的csv模塊來讀取和寫入.csv文件。

這裏有一個辦法做到這一點:

import csv 

prefix = 'events' # of output csv file names 
data = {} 

with open('conditions.csv', 'rb') as conditions: 
    reader = csv.reader(conditions) 
    for row in reader: 
     data.setdefault(row[0], []).append(row) 

for event in sorted(data): 
    csv_filename = '{}_{}.csv'.format(prefix, event) 
    print(csv_filename) 
    with open(csv_filename, 'wb') as csvfile: 
     writer = csv.writer(csvfile) 
     writer.writerows(data[event]) 

更新

上述第一實現的方法讀取整個CSV文件到內存中,然後寫入所有與每個事件值相關聯的行成一個單獨的輸出文件,一次一個。

更具有內存效率的方法是同時打開多個輸出文件,並在每個行被讀出到適當的目標文件後立即寫入每一行。這樣做需要跟蹤哪些文件已經打開。文件管理代碼需要做的其他事情是確保在處理完成時關閉所有文件。

在下面的代碼中,所有這些都是通過定義和使用Python Context Manager類型來集中處理可能生成的所有csv輸出文件,具體取決於輸入文件中有多少個不同的事件值。

import csv 
import sys 
PY3 = sys.version_info.major > 2 

class MultiCSVOutputFileManager(object): 
    """Context manager to open and close multiple csv files and csv writers. 
    """ 
    def __enter__(self): 
     self.files = {} 
     return self 

    def __exit__(self, exc_type, exc_value, traceback): 
     for file, csv_writer in self.files.values(): 
      print('closing file: {}'.format(file.name)) 
      file.close() 
     self.files.clear() 
     return None 

    def get_csv_writer(self, filename): 
     if filename not in self.files: # new file? 
      open_kwargs = dict(mode='w', newline='') if PY3 else dict(mode='wb') 
      print('opening file: {}'.format(filename)) 
      file = open(filename, **open_kwargs) 
      self.files[filename] = file, csv.writer(file) 

     return self.files[filename][1] # return associated csv.writer object 

這裏是如何使用它:

prefix = 'events' # to name of each csv output file 

with open('conditions.csv', 'rb') as conditions: 
    reader = csv.reader(conditions) 
    with MultiCSVOutputFileManager() as file_manager: 
     for row in reader: 
      csv_filename = '{}_{}.csv'.format(prefix, row[0]) # row[0] is event 
      writer = file_manager.get_csv_writer(csv_filename) 
      writer.writerow(row) 
+0

很好,謝謝你哈哈! – shamirs888

2
data = { 
     '89': [], 
     '90': [], 
     '91': [], 
     '92': [] 
    } 

with open('yourfile.csv') as infile: 
    for line in infile: 
     prefix = line[:2] 
     data[prefix].append(line) 

for prefix in data.keys(): 
    with open('csv' + prefix + '.csv', 'w') as csv: 
     csv.writelines(''.join(data[prefix])) 

但是,如果你是開放的,然後這可以通過運行四個命令

grep ^89 file.csv > 89.csv 
grep ^90 file.csv > 90.csv 

同樣,對於其它的值很容易地完成Python以外的解決方案。

+0

我知道了,但是我收到一個錯誤:「 文件」C:/Users/oshamir/untitled2.py「,第34行,在 數據[前綴] .append(行) KeyError:'uu'' – shamirs888

0

你甚至可以動態創建生成的文件,如果第一場尚未通過保持該ID的映射和相關文件中遇到:

files = {} 
with open('file.csv') as fd: 
    for line in fd: 
     if 0 == len(line.strip()): continue # skip empty lines 
     try: 
      id_field = line.split(',', 1)[0] # extract first field 
      if not id in files.keys():  # if not encountered open a new result file 
       files[id] = open(id + '.csv') 
      files[id].write(line)   # write the line in proper file 
     except Exception as e: 
      print('ERR', line, e)   # catchall in case of problems...