2017-02-18 46 views
-1

我正在學習python on dataquest並嘗試解決此問題。Python - 查找提取列表中的相同值並計算連續值之間的差異

編寫一個函數,可以跨年提取相同的值,並計算連續值之間的差異,以顯示出生人數是增加還是減少。 例如,在1994年到2003年間,每週的出生人數是如何變化的?

我想在Jupyter解決這個問題。我是python的新手,我不確定如何開始解決這個問題。

輸入的數據在這裏以CSV格式:US births

# coding: utf-8 

# In[1]: 

text_file = open("US_births_1994-2003_CDC_NCHS.csv", "r").read() 
line_split = text_file.split("\n") 
line_split 


# In[2]: 

def read_csv(filename): 
    text = open(filename, "r").read() 
    string_list = text.split('\n')[1:] 
    final_list = [] 
    for row in string_list: 
     int_fields = [] 
     string_fields = row.split(',') 
     for item in string_fields: 
      int_fields.append(int(item)) 
     final_list.append(int_fields) 
    return(final_list) 

cdc_list = read_csv("US_births_1994-2003_CDC_NCHS.csv") 
cdc_list[0:10] 


# In[3]: 

def months_births(filename): 
    births_per_month = dict() 
    for item in filename: 
     num_month = int(item[1]) 
     num_births = int(item[4]) 
     if num_month in births_per_month: 
      births_per_month[num_month] += num_births 
     else: 
      births_per_month[num_month] = num_births 
    return(births_per_month) 

cdc_month_births = months_births(cdc_list) 
cdc_month_births 


# In[4]: 

def dow_births(filename): 
    sum_births = dict() 
    for item in filename: 
     day_week = int(item[3]) 
     day_birth = int(item[4]) 
     if day_week in sum_births: 
      sum_births[day_week] += day_birth 
     else: 
      sum_births[day_week] = day_birth 
    return(sum_births) 

cdc_day_births = dow_births(cdc_list) 
cdc_day_births 


# In[30]: 

def calc_counts(data, column): 
    sum_dict = dict() 
    for item in data: 
     col_num = item[column] 
     birth_count = int(item[4]) 
     if col_num in sum_dict: 
      sum_dict[col_num] += birth_count 
     else: 
      sum_dict[col_num] = birth_count 
    return(sum_dict) 

cdc_year_births = calc_counts(cdc_list, 0) 
cdc_month_births = calc_counts(cdc_list, 1) 
cdc_dom_births = calc_counts(cdc_list, 2) 
cdc_dow_births = calc_counts(cdc_list, 3) 


# In[31]: 

cdc_year_births 


# In[32]: 

cdc_month_births 


# In[33]: 

cdc_dom_births 


# In[34]: 

cdc_dow_births 


# In[6]: 

def min_max_dict(filename, request): 
    if request == "max": 
     max_value = max(filename, key=filename.get) 
     return(filename[max_value]) 
    else: 
     min_value = min(filename, key=filename.get) 
     return(filename[min_value]) 

max_value = min_max_dict(cdc_year_births, "max") 
print("max: ",max_value) 
min_value = min_max_dict(cdc_year_births, "min") 
print("min: ",min_value) 


# In[36]: 

def diff_in_values(filename): 
    final_dict = dict() 
    seen_set = set() 
    unique_values = list() 
    for item in filename: 
     year_count = int(item[0]) 
     birth_count = int(item[4]) 
     day_of_week = int(item[3]) 

     if birth_count not in seen_set: 
      unique_values.append(birth_count) 
      seen_set.add(birth_count) 

    return(seen_set) 

result = diff_in_values(cdc_list) 
result 
+1

你開始使用,澄清要求。什麼是數年的數據格式和出生人數(數據本身的格式以及如何與其他數據一起存儲)?你的功能如何訪問這些數據?輸出的期望格式是什麼?是否有其他要求或偏好?我們不可能幫助您,直到您瞭解並理解這些要求,然後將其納入您的問題。檢查[FAQ](http://stackoverflow.com/tour)和[如何提問](http://stackoverflow.com/help/how-to-ask)。 –

+0

非常感謝。我在問題中包含了更多信息。請讓我知道這是不夠的。 – iprateekk

+0

需要更多關於輸入的信息,並且您對輸出沒有提及。您的驅動器上的csv文件與您的功能模塊位於同一文件夾中嗎?該文件是否保證具有該名稱,從1994-01-01到2003-12-31每天只有一行,對於標題行以及時間順序沒有重複或缺失的日期,行和數據線要完整並且格式正確等等?這個關於星期六誕生的問題是唯一需要的輸出嗎?等等。你知道你的代碼實際上並沒有訪問數據嗎? –

回答

2

我也工作在同一個項目。我已經分享了你需要的部分代碼。我在GitHub上有我項目的.ipynb文件。您可能還想看看我的功能結果。乾杯!

def read_csv(birth_data_file): 
    raw_data = open(birth_data_file, "r").read() 
    raw_data = raw_data.split("\n") 
    string_list = raw_data[1:] 
    final_list = [] 
    for data in string_list: 
     int_fields = [] 
     string_fields = data.split(",") 
     for string_field in string_fields: 
      field = int(string_field) 
      int_fields.append(field) 
     final_list.append(int_fields) 
    return(final_list) 


def calc_counts(data, column): 
    births_counts = {} 
    if not column > 0 and column <= 4: 
     return("'column' must be either 1, 2, 3, or 4") 
    else: 
     for instance in data: 
      field = instance[column-1] 
      births = instance[4] 
      if field in births_counts.keys(): 
       births_counts[field] += births 
      else: 
       births_counts[field] = births 
     return(births_counts) 


# Write a function that extracts the same values across years and calculates the 
# differences between consecutive values to show if number of 
# births is increasing or decreasing. 

def check_birth_growth(birth_data_file): 
    cdc_list = read_csv(birth_data_file) 
    cdc_year_births = calc_counts(cdc_list, 1) 
    previous_year_birth = 0 
    previous_birth_diff = 0 
    for year, total_births in cdc_year_births.items(): 
     current_year_birth = int(total_births) 
     if previous_year_birth == 0: 
      growth_status = "Growth of births in {} not available.".format(year) 
      print(growth_status) 
      previous_year_birth = current_year_birth 
     else: 
      if current_year_birth > previous_year_birth: 
       growth_status = "Births increased in {}.".format(year) 
       print(growth_status) 
       previous_year_birth = current_year_birth 
      elif current_year_birth < previous_year_birth: 
       growth_status = "Births decreased in {}.".format(year) 
       print(growth_status) 
       previous_year_birth = current_year_birth 
      elif current_year_birth == previous_year_birth: 
       growth_status = "Births in {} was same as previous year.".format(year) 
       print(growth_status) 
       previous_year_birth = current_year_birth 
+0

非常感謝:) – iprateekk

+0

嘿@iprateekk!我的榮幸! :) –

+0

謝謝。你真的幫了忙。 – user7479

0

Enock奎西Addey,這似乎只會一年比較一年出生,但不能逐月或日常出生,如1994年3月比較,1995年3月

+1

Enock的代碼沒問題。您需要在下面一行中更改0年的值,1的月份值,2的月份的日期值以及3的星期幾比較值。 cdc_year_births = calc_counts(cdc_list,1) – user7479

+0

函數check_birth_growth不應該有一個額外的參數嗎?該參數將是cdc_list的列號。這樣,我們不必每次修改列號(0到1到2等)都運行該函數。 – Tarek

+0

@Tarek,我希望我現在有一些時間再次研究一下。已經有一段時間。 :)無論如何,您可以修改代碼以適應您的實現方式。謝謝。 –

相關問題