2013-10-17 66 views
2

有人會幫我解決以下問題。我已經嘗試過,我也附上了解決方案。我已經使用了二維列表,但是我想要一個不包含二維列表的不同解決方案,這應該更加pythonic。從csv文件的每一列中獲取最大值

pl建議我你們有任何其他的方式來做到這一點。 Q)考慮自1990年以來每個月爲N個公司提供的CSV文件的股價。文件的格式如下,第一行作爲標題。

年,月,公司A,公司B,C公司,............. N公司

1990,一月,10,15,20,... .......,50

1990年2月,10,15,20日,..........,50

2013 9月,50個,10個,15 ............ 500

該解決方案應該是在這種格式。 a)每個公司股票價格最高的年份和月份列表。

這是我使用二維列表的答案。

def generate_list(file_path): 
    ''' 
     return list of list's containing file data.''' 

    data_list=None #local variable  
    try: 
     file_obj = open(file_path,'r') 
     try: 
      gen = (line.split(',') for line in file_obj) #generator, to generate one line each time until EOF (End of File) 
      for j,line in enumerate(gen): 
       if not data_list: 
        #if dl is None then create list containing n empty lists, where n will be number of columns. 
        data_list = [[] for i in range(len(line))] 
        if line[-1].find('\n'): 
         line[-1] = line[-1][:-1] #to remove last list element's '\n' character 

       #loop to convert numbers from string to float, and leave others as strings only 
       for i,l in enumerate(line): 
        if i >=2 and j >= 1: 
         data_list[i].append(float(l)) 
        else:    
         data_list[i].append(l) 
     except IOError, io_except: 
      print io_except 
     finally: 
      file_obj.close() 
    except IOError, io_exception: 
     print io_exception 

    return data_list 

def generate_result(file_path): 
    ''' 
     return list of tuples containing (max price, year, month, 
company name). 
    ''' 
    data_list = generate_list(file_path) 
    re=[] #list to store results in tuple formet as follow [(max_price, year, month, company_name), ....] 
    if data_list: 
     for i,d in enumerate(data_list): 
      if i >= 2: 
       m = max(data_list[i][1:])  #max_price for the company 
       idx = data_list[i].index(m) #getting index of max_price in the list 
       yr = data_list[0][idx]   #getting year by using index of max_price in list 
       mon = data_list[1][idx]  #getting month by using index of max_price in list 
       com = data_list[i][0]   #getting company_name 
       re.append((m,yr,mon,com)) 
     return re 


if __name__ == '__main__': 
    file_path = 'C:/Document and Settings/RajeshT/Desktop/nothing/imp/New Folder/tst.csv' 
    re = generate_result(file_path) 
    print 'result ', re 

I have tried to solve it with generator also, but in that case it was giving result for only one company i.e. only one column.

p = 'filepath.csv' 

f = open(p,'r') 
head = f.readline() 
gen = ((float(line.split(',')[n]), line.split(',',2)[0:2], head.split(',')[n]) for n in range(2,len(head.split(','))) for i,line in enumerate(f)) 
x = max((i for i in gen),key=lambda x:x[0]) 
print x 

可以採取其以CSV格式下面提供的輸入數據..

year,month,company 1,company 2,company 3,company 4,company 5 
1990,jan,201,245,243,179,133 
1990,feb,228,123,124,121,180 
1990,march,63,13,158,88,79 
1990,april,234,68,187,67,135 
1990,may,109,128,46,185,236 
1990,june,53,36,202,73,210 
1990,july,194,38,48,207,72 
1990,august,147,116,149,93,114 
1990,september,51,215,15,38,46 
1990,october,16,200,115,205,118 
1990,november,241,86,58,183,100 
1990,december,175,97,143,77,84 
1991,jan,190,68,236,202,19 
1991,feb,39,209,133,221,161 
1991,march,246,81,38,100,122 
1991,april,37,137,106,138,26 
1991,may,147,48,182,235,47 
1991,june,57,20,156,38,245 
1991,july,165,153,145,70,157 
1991,august,154,16,162,32,21 
1991,september,64,160,55,220,138 
1991,october,162,72,162,222,179 
1991,november,215,207,37,176,30 
1991,december,106,153,31,247,69 

預期輸出以下。

[(246.0, '1991', 'march', 'company 1'), 
(245.0, '1990', 'jan', 'company 2'), 
(243.0, '1990', 'jan', 'company 3'), 
(247.0, '1991', 'december', 'company 4'), 
(245.0, '1991', 'june', 'company 5')] 

在此先感謝...

+1

是否是numpy或pandas的一個選項? – beroe

+0

無論你認爲是更pythonic和最大限度地使用標準庫函數.. pl沒有第三方... –

+2

好吧,熊貓和numpy庫,你必須導入,所以我想你會打電話給第三方,但他們非常適合這種應用。但你也可以用標準的方法來做到這一點...... – beroe

回答

3

使用collections.OrderedDictcollections.namedtuple

import csv 
from collections import OrderedDict, namedtuple 

with open('abc1') as f: 
    reader = csv.reader(f) 
    tup = namedtuple('tup', ['price', 'year', 'month']) 
    d = OrderedDict() 
    names = next(reader)[2:] 
    for name in names: 
     #initialize the dict 
     d[name] = tup(0, 'year', 'month') 
    for row in reader: 
     year, month = row[:2]   # Use year, month, *prices = row in py3.x 
     for name, price in zip(names, map(int, row[2:])): # map(int, prices) py3.x 
      if d[name].price < price: 
       d[name] = tup(price, year, month) 
print d   

輸出:

OrderedDict([ 
('company 1', tup(price=246, year='1991', month='march')), 
('company 2', tup(price=245, year='1990', month='jan')), 
('company 3', tup(price=243, year='1990', month='jan')), 
('company 4', tup(price=247, year='1991', month='december')), 
('company 5', tup(price=245, year='1991', month='june'))]) 
1

我不完全知道你怎麼想的輸出所以現在我只是在用它打印輸出到屏幕上。

import os 
import csv 
import codecs 


## Import data !!!!!!!!!!!! CHANGE TO APPROPRIATE PATH !!!!!!!!!!!!!!!!! 
filename= os.path.expanduser("~/Documents/PYTHON/StackTest/tailor_raj/Workbook1.csv") 

## Get useable data 
data = [row for row in csv.reader(codecs.open(filename, 'rb', encoding="utf_8"))] 

## Find Number of rows 
row_count= (sum(1 for row in data)) -1 

## Find Number of columns 
    ## Since this cannot be explicitly done, I set it to run through the columns on one row until it fails. 
    ## Failure is caught by try/except so the program does not crash 
columns_found = False 
column_try =1 
while columns_found == False: 
    column_try +=1 
    try: 
     identify_column = data[0][column_try] 
    except: 
     columns_found=True 
## Set column count to discoverd column count (1 before it failed) 
column_count=column_try-1 

## Set which company we are checking (start with the first company listed. Since it starts at 0 the first company is at 2 not 3) 
companyIndex = 2 

#This will keep all the company bests as single rows of text. I was not sure how you wanted to output them. 
companyBest=[] 

## Set loop to go through each company 
while companyIndex <= (column_count): 

    ## For each new company reset the rowIndex and highestShare 
    rowIndex=1 
    highestShare=rowIndex 

    ## Set loop to go through each row 
    while rowIndex <=row_count: 
     ## Test if data point is above or equal to current max 
     ## Currently set to use the most recent high point 
     if int(data[highestShare][companyIndex]) <= int(data[rowIndex][companyIndex]): 
      highestShare=rowIndex 

     ## Move on to next row 
     rowIndex+=1 

    ## Company best = Company Name + year + month + value 
    companyBest.append(str(data[0][companyIndex])+": "+str(data[highestShare][0]) +", "+str(data[highestShare][1])+", "+str(data[highestShare][companyIndex])) 

    ## Move on to next company 
    companyIndex +=1 

for item in companyBest: 
    print item 

一定要改變你的文件名路徑的一個比較合適的。

輸出當前顯示這樣的:

公司A:1990年11月,1985年

公司B:1990,五月,52873

公司C:1990年,月,3658

公司d:1990年11月,156498

公司E:1990年7月,987

+0

感謝您的嘗試..我已經完成了更長的路......但是我想僅使用發生器(如果可能的話)和最少的代碼行......即以更多pythonic的方式來完成。 :) –

+0

啊,我的錯。我剛剛看到你嘗試了一臺發電機,並沒有意識到你想要一臺發電機作爲答案。 – ExperimentsWithCode

1

沒有發生不幸,但代碼量小,尤其是在Python 3:

from operator import itemgetter 
from csv import reader 

with open('test.csv') as f: 
    year, month, *data = zip(*reader(f)) 

for pricelist in data: 
    name = pricelist[0] 
    prices = map(int, pricelist[1:]) 
    i, price = max(enumerate(prices), key=itemgetter(1)) 
    print(name, price, year[i+1], month[i+1]) 

在Python 2.X你可以做同樣的事情,但稍微笨拙,使用下面的(和不同的打印語句):

with open('test.csv') as f: 
    columns = zip(*reader(f)) 
    year, month = columns[:2] 
    data = columns[2:] 

好吧,我想出了一些可怕的發電機!另外它還利用詞典元組比較和reduce來比較連續的行:

from functools import reduce # only in Python 3 
import csv 

def group(year, month, *prices): 
    return ((int(p), year, month) for p in prices) 

def compare(a, b): 
    return map(max, zip(a, group(*b))) 

def run(fname): 
    with open(fname) as f: 
     r = csv.reader(f) 
     names = next(r)[2:] 
     return zip(names, reduce(compare, r, group(*next(r)))) 

list(run('test.csv')) 
+0

有人可以爲此問題編寫測試用例 –