這個程序翻出兩個.CSV文件,這些文件在此鏈接的數據: https://drive.google.com/folderview?id=0B1SjPejhqNU-bVkzYlVHM2oxdGs&usp=sharing(Python)當試圖從.CSV中提取數據時列出索引超出範圍?
它應該尋找在每兩個文件中的一個逗號後什麼,但我的邏輯範圍是有點不對。我運行回溯誤差線101:
「線101,在calc_corr:sum_smokers_value = sum_smokers_value +浮動(s_percent_smokers_data [R] [1]) IndexError:列表索引超出範圍」
我假設它會在其他時間執行相同的操作[k] [1]。
非常感謝提前,如果有辦法解決這個問題。
方案至今:
# this program opens two files containing data and runs a corralation calculation
import math
def main():
try:
print('does smoking directly lead to lung cancer?')
print('''let's find out, shall we?''''')
print('to do so, this program will find correlation between the instances of smokers, and the number of people with lung cancer.')
percent_smokers, percent_cancer = retrieve_csv()
s_percent_smokers_data, c_percent_cancer_data = read_csv(percent_smokers, percent_cancer)
correlation = calc_corr(s_percent_smokers_data, c_percent_cancer_data,)
print('r_value =', corretation)
except IOError as e:
print(str(e))
print('this program has been cancelled. run it again.')
def retrieve_csv():
num_times_failed = 0
percent_smokers_opened = False
percent_cancer_opened = False
while((not percent_smokers_opened) or (not percent_cancer_opened)) and (num_times_failed < 5):
try:
if not percent_smokers_opened:
percent_smokers_input = input('what is the name of the file containing the percentage of smokers per state?')
percent_smokers = open(percent_smokers_input, 'r')
percent_smokers_opened = True
if not percent_cancer_opened:
percent_cancer_input = input('what is the name of the file containing the number of cases of lung cancer contracted?')
percent_cancer = open(percent_cancer_input, 'r')
percent_cancer_opened = True
except IOError:
print('a file was not located. try again.')
num_times_failed = num_times_failed + 1
if not percent_smokers_opened or not percent_cancer_opened:
raise IOError('you have failed too many times.')
else:
return(percent_smokers, percent_cancer)
def read_csv(percent_smokers, percent_cancer):
s_percent_smokers_data = []
c_percent_cancer_data = []
empty_list = ''
percent_smokers.readline()
percent_cancer.readline()
eof = False
while not eof:
smoker_list = percent_smokers.readline()
cancer_list = percent_cancer.readline()
if smoker_list == empty_list and cancer_list == empty_list:
eof = True
elif smoker_list == empty_list:
raise IOError('smokers file error')
elif cancer_list == empty_list:
raise IOError('cancer file error')
else:
s_percent_smokers_data.append(smoker_list.strip().split(','))
c_percent_cancer_data.append(cancer_list.strip().split(','))
return (s_percent_smokers_data, c_percent_cancer_data)
def calc_corr(s_percent_smokers_data, c_percent_cancer_data):
sum_smokers_value = sum_cancer_cases_values = 0
sum_smokers_sq = sum_cancer_cases_sq = 0
sum_value_porducts = 0
numbers = len(s_percent_smokers_data)
for k in range(0, numbers):
sum_smokers_value = sum_smokers_value + float(s_percent_smokers_data[k][1])
sum_cancer_cases_values = sum_cancer_cases_values + float(c_percent_cancer_data[k][1])
sum_smokers_sq = sum_smokers_sq + float(s_percent_smokers_data[k][1]) ** 2
sum_cancer_cases_sq = sum_cancer_cases_sq + float(c_percent_cancer_data[k][1]) ** 2
sum_value_products = sum_value_products + float(percent_smokers[k][1]) ** float(percent_cancer[k][1])
numerator_value = (numbers * sum_value_products) - (sum_smokers_value * sum_cancer_cases_values)
denominator_value = math.sqrt(abs((numbers * sum_smokers_sq) - (sum_smokers_value ** 2)) * ((numbers * sum_cancer_cases_sq) - (sum_cancer_cases_values ** 2)))
return numerator_value/denominator_value
main()
我的猜測是你的一個CSV文件的一行沒有逗號。是否有任何理由需要自己解析CSV文件,而不是使用'csv'模塊?通過使用'csv.reader'和'zip'(或者'itertools.zip_longest',如果您確實需要檢測文件的行數不同時),可以減少程序中的許多複雜性。 – Blckknght
@Blckknght,我運行Ubuntu,所以我在LibreOffice calc中創建了一個MS Excel的副本,並將它們保存爲CSV。有沒有更好的方法來做到這一點,比如製作一個word文檔?謝謝! – elm95