2014-10-29 88 views
-2

我需要能夠讀取csv文件並每天總結幾列,然後使用解決方案生成新的csv文件。我是Python新手,我已經想出瞭如何閱讀csv,但現在我必須弄清楚如何基於日期/時間列對列進行求和。使用Python讀取CSV文件並從具有日期和時間的列中打印出唯一的日期

CSV:

tag,date,symbol,exch,volume,price,side,ind 
1058,20140612 13:29:59.042,BRK/B,NQBX,1000,61.25,SELL_SHORT,A 
1059,20140612 13:29:59.043,JNJ,NQBX,185,31.94,SELL_SHORT,A 
1153,20140612 13:30:00.117,AAPL,NQBX,77,43.64,SELL,A 
1201,20140612 13:30:00.190,WFC,NQBX,100,49.92,SELL,A 
1720,20140612 13:30:04.003,JPM,NQBX,100,50.16,SELL,A 
1738,20140613 13:30:04.254,PFE,NQBX,600,43.89,SELL_SHORT,A 
108167,20140613 13:30:04.809,VZ,NSDQ,2000,61.23,SELL_SHORT,R 
1799,20140613 13:30:05.252,MSFT,NQBX,11,43.76,BUY,A 
1879,20140612 13:30:06.393,CVX,NQBX,40,70.58,BUY,A 
1908,20140612 13:30:06.803,INTC,NQBX,100,56.52,SELL_SHORT,A 
1989,201406117 13:30:08.003,GE,NQBX,100,50.14,SELL,A 
2008,20140619 13:30:08.169,JNJ,NQBX,97,15.18,SELL,A 
2021,20140619 13:30:08.393,PFE,NQBX,38,43.89,SELL_SHORT,A 
2197,20140619 13:30:10.599,WFC,NQBX,100,30.34,BUY,A 
2302,20140620 13:30:12.002,GE,NQBX,100,50.14,SELL,A 
2368,20140620 13:30:12.931,INTC,NQBX,500,31.44,SELL,A 

我需要總結每天的量柱,然後創建總結新的CSV。

回答

1

您可以使用csv.DictReaderitertools.groupby來實現您想要的。

import csv 
import itertools 

def sum_volumes_by_date(yourcsvfile, writetocsv): 
    # it will read all your data and pairing the header to values into a dictionary 
    results = [line for line in csv.DictReader(open(yourcsvfile))] 

    with open(writetocsv, 'w') as f:  
     f.write("Date,Sum(Vols)\n") 

     # use groupby to group a sorted list of the dictionary by its 'date' 
     for k, g in itertools.groupby(sorted(results, key=lambda x: x['date']), \ 
             lambda each: each['date'][:8]): 
      # then sum its relative 'volume' values 
      f.write("{},{}\n".format(k, sum([int(each['volume']) for each in g]))) 

用法:

>>> sum_volumes_by_date('in.csv', 'out.csv') 
>>> cat out.csv 
Date,Sum(Vols) 
20140611,100 
20140612,1602 
20140613,2611 
20140619,235 
20140620,600 
0

這可以很容易使用字典完成的,看看這個例子:

import csv 

with open('csv.csv', 'rb') as csv_file: 

    # initiate csv reader 
    csv_reader = csv.reader(csv_file) 

    # initiate empty dictionary 
    daily_volumes = {} 

    # iterate through each column 
    for row in csv_reader: 
     # attempt to add to an existing date key (this will fail the first time we get a new date) 
     try: 
      # add the new volume to this day 
      daily_volumes[row[1].split(' ')[0]] += int(row[4]) 
     except KeyError: 
      try: 
       # this date does not exist as a key yet, so now we create it 
       daily_volumes[row[1].split(' ')[0]] = int(row[4]) 
      except ValueError: 
       # the header will error out on the int() function, so just skip it 
       pass 

    # This will give us a dictionary like so: 
    ''' 
    daily_volumes = { 
     '20140619': 235, 
     '20140612': 1602, 
     '20140613': 2611, 
     '201406117': 100, 
     '20140620': 600 
    } 
    ''' 

    # Now create a new CSV and write these values to it 
    with open('new_csv.csv', 'wb') as new_csv_file: 
     # initiate csv writer 
     csv_writer = csv.writer(new_csv_file) 

     # write each key as a row 
     for date, volume in daily_volumes.iteritems(): 
      csv_writer.writerow([date, volume]) 
相關問題