2013-04-17 64 views
2

我有重複的值CSV文件中的第一列,例如一個CSV文件:的Python:添加列基於第一欄和第二欄

mg,known,127 
mg,unknown,142 
pnt,known,37 
pnt,unknown,0 
lmo,known,75 
lmo,unknown,3 
sl,known,197 
sl,unknown,21 
oc,unknown,32 
oc,known,163 
sv,known,368 
sv,unknown,308 
az,unknown,6 
az,known,241 
bug,unknown,1 
bug,known,167 
li,unknown,15 
li,known,174 
lg,known,3 

我想要做的是建立一個新的csv文件,使得例如:

header1, known, unknown 
mg, 127, 142 
pnt, 37, 0 

我試圖找出如何我才能真正構建該行:

def read_stats(path): 
    has_seen = set() 
    with open(writepath, 'wb') as write_csv: 
     with open(path, 'r') as csv_file: 
      data_reader = csv.reader(csv_file, delimiter=',') 
      for line in data_reader: 
       if line[0] in has_seen: 

這是我目前遇到的位置,是否必須保留指向下一行的指針?

回答

3

這裏有一個辦法,聚集在一個OrderedDict結果:

>>> import csv 
>>> import collections 

>>> d = collections.OrderedDict() 
>>> for header1, category, value in csv.reader(datafile): 
     d.setdefault(header1, {})[category] = value 

>>> for header1, m in d.items(): 
     print ', '.join([header1, m['known'], m['unknown']]) 

mg, 127, 142 
pnt, 37, 0 
lmo, 75, 3 
sl, 197, 21 
oc, 163, 32 
sv, 368, 308 
az, 241, 6 
bug, 167, 1 
li, 174, 15 

如果您可以假設總是在連續對與已知組第一線,您可以創建的已知,中間結果和發射完整排行爲:

>>> for header1, category, value in csv.reader(data): 
     if category == 'known': 
      result = [header1, value] 
     else: 
      result += [value] 
      print ', '.join(result) 

mg, 127, 142 
pnt, 37, 0 
lmo, 75, 3 
sl, 197, 21 
oc, 163, 32 
sv, 368, 308 
az, 241, 6 
bug, 167, 1 
li, 174, 15 
+1

感謝您的寶貴意見 –

相關問題