2013-04-22 76 views
1

我有一個代碼,它能夠給我的名單如下:如何計算每個與Python相同的鍵的不同值?

Name id number week number 
    Piata 4   6  
    Mali 2   20,5  
    Goerge 5   4  
    Gooki 3   24,64,6 
    Mali 5   45,9 
    Piata 6   1  
    Piata 12   2,7,8,27,16 etc.. 

與下面的代碼:

import csv 
from datetime import date 

datedict = defaultdict(set) 
with open('d:/info.csv', 'r') as csvfile: 
    filereader = csv.reader(csvfile, 'excel') 
    #passing the header 
    read_header = False 
    start_date=date(year=2009,month=1,day=1) 
    #print((seen_date - start_date).days) 
    tdic = {} 
    for row in filereader: 
     if not read_header: 
      read_header = True 
      continue 

    # reading the rest rows 
     name,id,firstseen = row[0],row[1],row[3] 
     try: 
      seen_date = datetime.datetime.strptime(firstseen, '%d/%m/%Y').date()    
      deltadays = (seen_date-start_date).days 
      deltaweeks = deltadays/7 + 1 
      key = name,id 
      currentvalue = tdic.get(key, set()) 
      currentvalue.add(deltaweeks) 
      tdic[key] = currentvalue 

     except ValueError: 
      print('Date value error') 
      pass 

現在,我想我的列表轉換爲給我一個名單每個名稱的ID號和其週數如下表:

Name  number of ids  weeknumbers 
Mali   2    20,5,45,9 
Piata  3    1,6,2,7,8,27,16 
Goerge  1     4 
Gooki  1     24,64,6 

任何人都可以幫我編寫這部分的代碼?

+0

您的CSV輸入文件d:/info.csv不應該有彗星分隔的樣式?什麼是defaultdict(set)?是否等於{}? – 2013-04-22 14:09:29

+0

@Mikael Mayer輸入正常。它的工作原理和默認(設置)= {} – UserYmY 2013-04-22 14:17:09

+0

你能提供一個輸入文件嗎?我不明白你提供的第一個文件是輸出。 – 2013-04-22 14:27:00

回答

0

givent說:

tdict = {('Mali', 5): set([9, 45]), ('Gooki', 3): set([24, 64, 6]), ('Goerge', 5): set([4]), ('Mali', 2): set([20, 5]), ('Piata', 4): set([4]), ('Piata', 6): set([1]), ('Piata', 12): set([8, 16, 2, 27, 7])} 

然後將結果輸出上面:

names = {} 
for ((name, id), more_weeks) in tdict.items(): 
    (ids, weeks) = names.get(name, (0, set())) 
    ids = ids + 1 
    weeks = weeks.union(more_weeks) 
    names[name] = (ids, weeks) 

for (name, (id, weeks)) in names.items(): 
    print("%s, %s, %s" % (name, id, weeks) 
0

因爲它看起來像你的CSV文件頭(即當前忽略)爲什麼不使用DictReader代替的標準reader類?如果您不提供字段名稱,DictReader將假定第一行包含它們,這將使您不必跳過循環中的第一行。

這似乎是一個很好的機會使用collections模塊中的defaultdictCounter

import csv 
from datetime import date 
from collections import defaultdict, Counter 


datedict = defaultdict(set) 
namecounter = Counter() 
with open('d:/info.csv', 'r') as csvfile: 
    filereader = csv.DictReader(csvfile) 
    start_date=date(year=2009,month=1,day=1) 

    for row in filereader: 
     name,id,firstseen = row['name'], row['id'], row['firstseen'] 

     try: 
      seen_date = datetime.datetime.strptime(firstseen, '%d/%m/%Y').date() 
     except ValueError: 
      print('Date value error') 
      pass 

     deltadays = (seen_date-start_date).days 
     deltaweeks = deltadays/7 + 1 

     datedict[name].add(deltaweeks) 
     namecounter.update([name]) # Without putting name into a list, update will index each character 

這裏假定(name, id)是唯一的。如果情況並非如此,那麼您可以使用另一個defaultdict作爲namecounter。我也移動了try-except語句,以便在測試中更明確。

相關問題