如何分割csv文件的條件？

我有這個csv文件：如何分割csv文件的條件？

89,Network activity,ip-dst,80.179.42.44,,1,20160929 
89,Payload delivery,md5,4ad2924ced722ab65ff978f83a40448e,,1,20160929 
89,Network activity,domain,alkamaihd.net,,1,20160929 
90,Payload delivery,md5,197c018922237828683783654d3c632a,,1,20160929 
90,Network activity,domain,dnsrecordsolver.tk,,1,20160929 
90,Network activity,ip-dst,178.33.94.47,,1,20160929 
90,Payload delivery,filename,Airline.xls,,1,20160929 
91,Payload delivery,md5,23a9bbf8d64ae893db17777bedccdc05,,1,20160929 
91,Payload delivery,md5,07e47f06c5ed05a062e674f8d11b01d8,,1,20160929 
91,Payload delivery,md5,bd75af219f417413a4e0fae8cd89febd,,1,20160929 
91,Payload delivery,md5,9f4023f2aefc8c4c261bfdd4bd911952,,1,20160929 
91,Network activity,domain,mailsinfo.net,,1,20160929 
91,Payload delivery,md5,1e4653631feebf507faeb9406664792f,,1,20160929 
92,Payload delivery,md5,6fa869f17b703a1282b8f386d0d87bd4,,1,20160929 
92,Payload delivery,md5,24befa319fd96dea587f82eb945f5d2a,,1,20160929

我需要這個CSV文件分割到4個CSV文件，其中的條件是在每一行開頭的事件編號。到目前爲止，我創建了一個包含事件編號{89,90,91,92}的集合，並且我知道我需要在循環中進行循環，並將每一行復制到其專用的csv文件中。

來源

2016-11-29 shamirs888

看一看這個類似的問題：http://stackoverflow.com/questions/40789383/python-split-csv-file-according-第一列字符/ 40790237＃40790237 – chthonicdaemon

這將是最好不要硬編碼的事件號碼你的代碼，所以它不依賴於數據的值。我還傾向於使用經過優化的csv模塊來讀取和寫入.csv文件。

這裏有一個辦法做到這一點：

import csv 

prefix = 'events' # of output csv file names 
data = {} 

with open('conditions.csv', 'rb') as conditions: 
    reader = csv.reader(conditions) 
    for row in reader: 
     data.setdefault(row[0], []).append(row) 

for event in sorted(data): 
    csv_filename = '{}_{}.csv'.format(prefix, event) 
    print(csv_filename) 
    with open(csv_filename, 'wb') as csvfile: 
     writer = csv.writer(csvfile) 
     writer.writerows(data[event])

更新

上述第一實現的方法讀取整個CSV文件到內存中，然後寫入所有與每個事件值相關聯的行成一個單獨的輸出文件，一次一個。

更具有內存效率的方法是同時打開多個輸出文件，並在每個行被讀出到適當的目標文件後立即寫入每一行。這樣做需要跟蹤哪些文件已經打開。文件管理代碼需要做的其他事情是確保在處理完成時關閉所有文件。

在下面的代碼中，所有這些都是通過定義和使用Python Context Manager類型來集中處理可能生成的所有csv輸出文件，具體取決於輸入文件中有多少個不同的事件值。

import csv 
import sys 
PY3 = sys.version_info.major > 2 

class MultiCSVOutputFileManager(object): 
    """Context manager to open and close multiple csv files and csv writers. 
    """ 
    def __enter__(self): 
     self.files = {} 
     return self 

    def __exit__(self, exc_type, exc_value, traceback): 
     for file, csv_writer in self.files.values(): 
      print('closing file: {}'.format(file.name)) 
      file.close() 
     self.files.clear() 
     return None 

    def get_csv_writer(self, filename): 
     if filename not in self.files: # new file? 
      open_kwargs = dict(mode='w', newline='') if PY3 else dict(mode='wb') 
      print('opening file: {}'.format(filename)) 
      file = open(filename, **open_kwargs) 
      self.files[filename] = file, csv.writer(file) 

     return self.files[filename][1] # return associated csv.writer object

這裏是如何使用它：

prefix = 'events' # to name of each csv output file 

with open('conditions.csv', 'rb') as conditions: 
    reader = csv.reader(conditions) 
    with MultiCSVOutputFileManager() as file_manager: 
     for row in reader: 
      csv_filename = '{}_{}.csv'.format(prefix, row[0]) # row[0] is event 
      writer = file_manager.get_csv_writer(csv_filename) 
      writer.writerow(row)

來源

2016-11-29 15:45:28 martineau

很好，謝謝你哈哈！ – shamirs888

data = { 
     '89': [], 
     '90': [], 
     '91': [], 
     '92': [] 
    } 

with open('yourfile.csv') as infile: 
    for line in infile: 
     prefix = line[:2] 
     data[prefix].append(line) 

for prefix in data.keys(): 
    with open('csv' + prefix + '.csv', 'w') as csv: 
     csv.writelines(''.join(data[prefix]))

但是，如果你是開放的，然後這可以通過運行四個命令

grep ^89 file.csv > 89.csv 
grep ^90 file.csv > 90.csv

同樣，對於其它的值很容易地完成Python以外的解決方案。

來源

2016-11-29 15:20:39

我知道了，但是我收到一個錯誤：「文件」C：/Users/oshamir/untitled2.py「，第34行，在數據[前綴] .append（行） KeyError：'uu'' – shamirs888

你甚至可以動態創建生成的文件，如果第一場尚未通過保持該ID的映射和相關文件中遇到：

files = {} 
with open('file.csv') as fd: 
    for line in fd: 
     if 0 == len(line.strip()): continue # skip empty lines 
     try: 
      id_field = line.split(',', 1)[0] # extract first field 
      if not id in files.keys():  # if not encountered open a new result file 
       files[id] = open(id + '.csv') 
      files[id].write(line)   # write the line in proper file 
     except Exception as e: 
      print('ERR', line, e)   # catchall in case of problems...

來源

2016-11-29 15:39:37

如何分割csv文件的條件？

回答

相關問題