2013-12-13 27 views
-1

這個程序翻出兩個.CSV文件,這些文件在此鏈接的數據: https://drive.google.com/folderview?id=0B1SjPejhqNU-bVkzYlVHM2oxdGs&usp=sharing(Python)當試圖從.CSV中提取數據時列出索引超出範圍?

它應該尋找在每兩個文件中的一個逗號後什麼,但我的邏輯範圍是有點不對。我運行回溯誤差線101:

「線101,在calc_corr:sum_smokers_value = sum_smokers_value +浮動(s_percent_smokers_data [R] [1]) IndexError:列表索引超出範圍」

我假設它會在其他時間執行相同的操作[k] [1]。

非常感謝提前,如果有辦法解決這個問題。

方案至今:

# this program opens two files containing data and runs a corralation calculation 

import math 

def main(): 


    try: 
     print('does smoking directly lead to lung cancer?') 
     print('''let's find out, shall we?''''') 
     print('to do so, this program will find correlation between the instances of smokers, and the number of people with lung cancer.') 

     percent_smokers, percent_cancer = retrieve_csv() 

     s_percent_smokers_data, c_percent_cancer_data = read_csv(percent_smokers, percent_cancer) 

     correlation = calc_corr(s_percent_smokers_data, c_percent_cancer_data,) 

     print('r_value =', corretation) 

    except IOError as e: 
     print(str(e)) 
     print('this program has been cancelled. run it again.') 



def retrieve_csv(): 
    num_times_failed = 0 
    percent_smokers_opened = False 
    percent_cancer_opened = False 



    while((not percent_smokers_opened) or (not percent_cancer_opened)) and (num_times_failed < 5): 

     try: 

      if not percent_smokers_opened: 
       percent_smokers_input = input('what is the name of the file containing the percentage of smokers per state?') 
       percent_smokers = open(percent_smokers_input, 'r') 
       percent_smokers_opened = True 

      if not percent_cancer_opened: 
       percent_cancer_input = input('what is the name of the file containing the number of cases of lung cancer contracted?') 
       percent_cancer = open(percent_cancer_input, 'r') 
       percent_cancer_opened = True 

     except IOError: 
      print('a file was not located. try again.') 
      num_times_failed = num_times_failed + 1 

    if not percent_smokers_opened or not percent_cancer_opened: 
     raise IOError('you have failed too many times.') 

    else: 
     return(percent_smokers, percent_cancer) 



def read_csv(percent_smokers, percent_cancer): 
    s_percent_smokers_data = [] 
    c_percent_cancer_data = [] 

    empty_list = '' 


    percent_smokers.readline() 
    percent_cancer.readline() 
    eof = False 

    while not eof: 
     smoker_list = percent_smokers.readline() 
     cancer_list = percent_cancer.readline() 

     if smoker_list == empty_list and cancer_list == empty_list: 
      eof = True 

     elif smoker_list == empty_list: 
      raise IOError('smokers file error') 

     elif cancer_list == empty_list: 
      raise IOError('cancer file error') 


     else: 
      s_percent_smokers_data.append(smoker_list.strip().split(',')) 
      c_percent_cancer_data.append(cancer_list.strip().split(',')) 


    return (s_percent_smokers_data, c_percent_cancer_data) 


def calc_corr(s_percent_smokers_data, c_percent_cancer_data): 

    sum_smokers_value = sum_cancer_cases_values = 0 
    sum_smokers_sq = sum_cancer_cases_sq = 0 
    sum_value_porducts = 0 
    numbers = len(s_percent_smokers_data) 

    for k in range(0, numbers): 
     sum_smokers_value = sum_smokers_value + float(s_percent_smokers_data[k][1]) 
     sum_cancer_cases_values = sum_cancer_cases_values + float(c_percent_cancer_data[k][1]) 

     sum_smokers_sq = sum_smokers_sq + float(s_percent_smokers_data[k][1]) ** 2 
     sum_cancer_cases_sq = sum_cancer_cases_sq + float(c_percent_cancer_data[k][1]) ** 2 

     sum_value_products = sum_value_products + float(percent_smokers[k][1]) ** float(percent_cancer[k][1]) 

    numerator_value = (numbers * sum_value_products) - (sum_smokers_value * sum_cancer_cases_values) 
    denominator_value = math.sqrt(abs((numbers * sum_smokers_sq) - (sum_smokers_value ** 2)) * ((numbers * sum_cancer_cases_sq) - (sum_cancer_cases_values ** 2))) 



    return numerator_value/denominator_value 


main() 
+0

我的猜測是你的一個CSV文件的一行沒有逗號。是否有任何理由需要自己解析CSV文件,而不是使用'csv'模塊?通過使用'csv.reader'和'zip'(或者'itertools.zip_longest',如果您確實需要檢測文件的行數不同時),可以減少程序中的許多複雜性。 – Blckknght

+0

@Blckknght,我運行Ubuntu,所以我在LibreOffice calc中創建了一個MS Excel的副本,並將它們保存爲CSV。有沒有更好的方法來做到這一點,比如製作一個word文檔?謝謝! – elm95

回答

0

數據文件的每一行中的值不會被逗號分隔,而是製表符分隔。您需要更改','分隔字符'\t'。或者使用csv模塊,並告訴它你的分隔符是'\t'。您可以在the documentation中閱讀關於csv模塊的更多信息。

+0

我假設我必須以任何方式全局導入csv,對嗎? – elm95

+0

我不確定我的理解。你的代碼中的直接問題可以通過在兩個split分支調用中交換''\ t''來解決。 (可能還有其他問題,因爲我沒有測試過你的代碼。)切換使用'csv'模塊是一個更大的改變。導入模塊是第一步,但本身不會做任何事情。 – Blckknght

+0

得到它的工作。還發現並糾正了第19行和第117行的錯別字。欣賞它! – elm95