2014-06-23 107 views
0

我正在處理從csv文件收集的列表或詞典。我想這樣做是寫出來與特定的屬性相關的最大值和最小值新的CSV文件,如:具有相同屬性的Python最大和最小記錄

field1 field2 field3 
1  hi  1 
2  hi  5 
3  bye 2 
4  bye 7 

應的屬性在field2在新的CSV文件中列出如下:

f1 f2 min max 
1 hi 1  5 
2 bye 2  7 

我的數據結構非常薄弱,但我嘗試了幾種不同的方式,包括從csv文件中讀取它。我認爲如果我將數據放入字典或列表中,只要能夠在找到最小值和最大值時將其輸出到csv文件,就可以輕鬆找到它。

這是我試過的。我認爲這是一個算法問題。底部的代碼適用於我,但我不知道什麼時候它們不再相等,所以我不知道什麼時候將它放在csv文件中,如min;清單完成時間是什麼時候?

第一次嘗試:

dict_rows = {} 
frames = [] 
lines = (line.strip() for line in open(csvFile)) 
reader = csv.reader(lines, delimiter='\t', quoting=csv.QUOTE_NONE) 
i = 0 
for rec in reader: 
    #print rec 
    dict_rows[i] = (rec[1],rec[5]) 
    i += 1 


## for key in dict_rows[1]: 
##  if dict_rows[key]>max: 
##   max = d[key] 


##  if d[1] == d[1]: 
##   print d 
##   print "equal" 
    print dict_rows 

max_value = max(dict_rows.values()) 
min_value = min(dict_rows.values()) 
print max_value 
print min_value 

這似乎更接近,但:

prev_line = None 
lines = (line.strip() for line in open(csvFile)) 
## for line in lines: 
##  print prev_line,line 
##  prev_line = line 
reader = csv.reader(lines, delimiter='\t', quoting=csv.QUOTE_NONE) 
i = 1 
frames = [] 
x = bool 
for line in reader: 
    print '%s) %s ' %(i,line) 
    #print 'Previous: %s \n Current: %s' %(prev_line, line) 

    #print '%s) %s ' %(prev_line,line) 
##  if i == 1: 
##   print 'First line header' 
##   next_line = reader.next() 
    if prev_line != None: 
##   if prev_line[1] != line[1]: 
##    print '%i) Does NOT %s != %s ?' %(i, prev_line[1],line[1]) 
      if prev_line[1] == line[1]: 
      print '%i) EQUAL! %s == %s' %(i, prev_line[1],line[1]) 

      num = line[5] 
      frames.append(num) 
      x = True 


     else: 
      print '%i) Does NOT %s != %s ?' %(i, prev_line[1],line[1]) 
      frames = [] 
      x = False 

    prev_line = line 
    if x == True: 
     min_frame = min(frames) 
     max_frame = max(frames) 
    else: 
     min_frame = 0 
     max_frame = 0 
    print min_frame 
    print max_frame 



    else: 
     next_line = reader.next() 
     print 'Next: %s' % next_line[1] 
     print '%i) Does %s == %s == %s ?' %(i, prev_line[1],line[1],next_line[1]) 

     if line[1] != next_line[1]: 
      print '%i) %s != %s' %(i, line[1],next_line[1]) 

     elif line[1] != next_line: 
      print '%i) Does not! %s != %s' %(i, line[1],next_line[1]) 


    i +=1 
+4

請您清理的例子。 –

+0

仍然需要相當多的清理。這相當混亂。 –

回答

0

這工作:

data={} 
with open(fn) as f: 
    reader=csv.reader(f, delimiter='\t', quoting=csv.QUOTE_NONE) 
    header=next(reader) 
    for row in reader: 
     data.setdefault(row[1], []).append(int(row[2])) 

print 'key\tmin\tmax'  
for k in data.keys(): 
    print '{}\t{}\t{}'.format(k, min(data[k]), max(data[k]))  

有了您的數據。例如,打印:

key min max 
bye 2 7 
hi 1 5 
+0

這正是我的想法,但不知道如何實現它謝謝!也爲了看看我的劣質嘗試,並花時間閱讀我正在嘗試做什麼:)。你在牆上救了我痛苦的幾個小時。 – lindzylu

0

也許這樣的事情

dict_rows = {} 
lines = (line.strip() for line in open(csvFile)) 
for line in csv.reader(lines, delimiter='\t', quoting=csv.QUOTE_NONE): 
    key = line[1] 
    value = line[5] 
    prev = dict_rows.get(key, (value, value)) 
    dict_rows[key] = (min(prev[0], value), max(prev[1], value)) 
for key, value in dict_rows.items(): 
    print key, value[0], value[1] 
0

使用熊貓。這裏有一個樣本

import pandas as pd 
df = pd.read_csv(filename) 
df.groupby('field2').agg([np.min, np.max]).to_csv(out_filename) 
相關問題