2017-10-16 49 views
0

我正在嘗試做回年,每年的平均分數。 我想要做的是創建一個符合年份的字典:年級,然後得到另一個年份字典:sum_of_grade然後等等。創建一個字典,提取平均數字

它從csv文件,它有兩個頭是:一年級

Year Grade 
2001 100 
2002 99 
2001 88 
2003 11 
2005 55 

還有很多,但我不認爲這是需要有完整的數據。

def construct_values(file): 
    """ 
    Construct the values needed to graph the average grade of the class over time 

    Parameters 
    ---------- 
    file_path: A string. Absolute path to file. 

    Returns 
    ------- 
    years: array of integers 
    average_grades: array of floats 
    """ 
    years, average_grades = [], [] 
    grades = [] 
    d = {} 
    with open(file,'r') as f: 
     next(f) 
     for line in f: 
      year, grade = (s.strip() for s in line.split(',')) 
      years.append(year) # array year 
      grades.append(grade) # array grade 
      d = dict(zip(years,grades)) # dict year:grade 

     for i,j in d: 
      # i for count frequencies of years 
      # j for summation of grades 
      # j/i for average grade and extract as array 


     return years, average_grades 

我試圖說清楚,但如果仍不清楚,請告訴我。

回答

1

當你使用這個有個問題:

d = dict(zip(years,grades)) # dict year:grade

把你的輸入數據作爲例子,它會生成一個字典,如:

{2001: 88, 2002: 99, 2003:11, 2005: 55} 

因爲在構造字典中存在重複鍵時,該值將被覆蓋。

那麼,要實現這一點,我建議用另一種字典生成方法,做這樣的事情:

def construct_values(file): 
    """ 
    Construct the values needed to graph the average grade of the class over time 

    Parameters 
    ---------- 
    file_path: A string. Absolute path to file. 

    Returns 
    ------- 
    years: array of integers 
    average_grades: array of floats 
    """ 
    years, average_grades = [], [] 
    # grades = []  This variable don't need anymore 
    d = {} 
    with open(file,'r') as f: 
     next(f) 
     for line in f: 
      year, grade = (s.strip() for s in line.split(',')) 

      # here is the begin line difference from your code 
      if year not in d: 
       d[year] = [int(grade), 1] 
      else: 
       d[year][0] += int(grade) 
       d[year][1] += 1 

     for year, grade_info in d.items(): 
      years.append(year) 
      average_grades.append(grade_info[0]/grade_info[1]) 
      # end difference from your code 

     return years, average_grades 

在中間字典d,價值節省約[sum_of_grade,times_appeared_in_the_year信息],因此當迭代字典時,可以使用sum_of_grade/times_appeared_in_the_year輕鬆地使用來計算平均值。

所以,當你看到一個表(CSV文件是一個),你應該覺得熊貓(我認爲)你並不需要儘快使用額外的變量等級

+0

怎麼了我在years.append()?它應該是年嗎?另外,在這種情況下,我不明白如何添加成績。 – Mayjunejuly

+0

是的,我很抱歉,今年。這個年級附加在這裏:'''d [year] = [grade,1]''',當你第一次見面* 2001,100 *時。中間字典將是{2001:[100,1]},然後符合* 2001,88 *。中間字典將是{2001:[188,2],2002:[99,1]}。因爲您只想返回平均成績,所以我認爲我們可以保存成績的總和以及列表中的成績數。沒有必要附加實際的成績值。 – Ballack

+0

您的代碼正常工作,但這不考慮訂單嗎?因爲它是所有重要的,而不是從2001年開始。 – Mayjunejuly

0

雖然創建爲dict(zip(years,grades))重複密鑰將不允許在字典中。所以最好使用字典以外的替代方法。

有些事情是這樣的。

from itertools import groupby 
combined = zip(year,grade)  
for n,g in groupby(sorted(combined, key = lambda x:x[0]),key=lambda x:x[0]): 
    grades = [int(i[1])for i in g] 
    print 'year : %s average : %s' %(n,sum(grades)/len(grades)) 

結果:

year : 2001 average : 94 
year : 2002 average : 99 
year : 2003 average : 11 
year : 2005 average : 55 
1

這裏是一個大熊貓溶液:

import pandas as pd 
import io 

csv = """Year,Grade 
2001,100 
2002,99 
2001,88 
2003,11 
2005,55""" 

df = pd.read_csv(io.StringIO(csv)) 

year_grade = {k: list(v) for k,v in df.groupby("Year")["Grade"]} 
year_avg_grade = df.groupby("Year")["Grade"].mean().to_dict() 

year_grade:

{2001: [100, 88], 2002: [99], 2003: [11], 2005: [55]} 

year_avg_grade:

{2001: 94, 2002: 99, 2003: 11, 2005: 55} 
+0

非常真實。我理解大熊貓在這裏工作得很好。但由於某種原因,這項任務阻止了我使用熊貓。 – Mayjunejuly