我一直在研究我的Python技能。 這是我正在處理的數據的原始文本文件:Titanic data在Python中使用CSV模塊需要幫助
每一行代表一個人在船上。該文件有幾列,包括該人是否存活(第三欄)。我試圖計算船上每個人口的人數(即多少名男性和多少名女性)以及每個羣體的倖存者人數。
我試圖在三個階段做到這一點: 首先,爲與人(先生,女士,小姐)相關的前綴添加一列。 然後,定義一個函數 - get_avg()來標識將找到信息的列以及該列的可能值,並將它們提供給grab_values函數。 第三,grab_values()計算每個組的人數和倖存者數量。
這一切都很好,很花哨......但它不起作用。 我一直得到0的計數和總和。試圖儘可能地堅持打印命令並取得了一些進展,但仍然無法理解我應該做什麼。我有一種感覺,就像函數沒有在所有行(或其中任何一行)上運行,但不知道這是否是真正的原因以及如何處理它。
任何人都可以請幫忙嗎?
import csv
titanic = open('shorttitanic.txt', "rb")
reader = csv.reader(titanic)
prefix_list = ["Mr ", "Mrs", "Mis"] # used to determine if passanger's name includes a prefix
# There are several demographic details we can count passengers and survivors with, this is a dictionary to map them out along with col number.
details = {"embarked":[5, "Southampton", "Cherbourg", "Queenstown", ""],
"sex":[10, "male","female"], "pclass":[1,"1st","2nd","3rd"],
"prefix":[12,"Mr ", "Mrs", "Mis"]} # first item is col number (starts at 0), other items are the possible values
# Adding another column for prefix:
rownum = 0
for row in reader:
# Finding the header:
if rownum == 0:
header = row
header.append("Prefix")
# print header
else:
prefix_location = row[3].find(",") + 2 # finds the position of the comma, the prefix starts after the comma and after a space (+2)
prefix = row[3][prefix_location:prefix_location+3] # grabs the 3 first characters of the prefix
# print len(prefix), prefix
if prefix in prefix_list: # if there's a prefix in the passanger's name, it's appended to the row
if prefix == "Mis":
row.append("Miss") # Mis is corrected to Miss on appending, since we must work with 3 chars
else:
row.append(prefix)
else:
row.append("Other/Unknown") # for cases where there's no prefix in the passanger's name
# print len(row), rownum, row[3], prefix, row[11]
# print row
rownum += 1
# grab_values() will run on all rows and count the number of passengers in each demographic and the number of survivors
def grab_values(col_num,i):
print col_num, "item name", i
count = 0
tot = 0
for row in reader:
# print type(row[col_num][0]
print row[col_num]
if row[col_num] == i:
count += 1
if row[2] == int(1):
tot += 1
# print count, tot
return count, tot
# get_avg() finds the column number and possible values of demographic x.
def get_avg(x): # x is the category (sex, embarked...)
col_num = details[x][0]
for i in details[x][1:]:
print col_num, i
# print type(i)
grab_values(col_num,i)
count,tot = grab_values(col_num,i)
print count,tot
# print i, count, tot
get_avg("sex")
titanic.close()
編輯:改變了前綴值在字典到: 「前綴」:[12, 「夫人」, 「誤」 「MR」]},其中有許多工作要做。
編輯2:這是完成的代碼,以防有人感興趣。我接受了warunsl關於問題性質的建議,但他的解決方案並不奏效,至少在我做出修改時,所以我不能選擇它作爲正確的解決方案,以防其他人會發現此線程並嘗試向其學習。非常感謝幫手!
import csv
titanic = open('titanic.txt', "rb")
reader = csv.reader(titanic)
prefix_list = ["Mr ", "Mrs", "Mis"] # used to determine if passanger's name includes a prefix. Using 3 chars because of Mr.
# There are several demographic details we can count passengers and survivors with, this is a dictionary to map them out along with col number.
details = {"embarked":[5, "Southampton", "Cherbourg", "Queenstown", ""],
"sex":[10, "male","female"], "pclass":[1,"1st","2nd","3rd"],
"prefix":[11,"Mr ", "Mrs", "Miss", "Unknown"]} # first item is col number (starts at 0), other items are the possible values
# try to see how the prefix values can be created by using 11 and a refernce to prefix_list
# Here we'll do 2 things:
# I - Add another column for prefix, and -
# II - Create processed_list with each of the rows in reader, since we can only run over reader once,
# and since I don't know much about handling CSVs or generator yet we'll run on the processed_list instead
processed_list = []
rownum = 0
for row in reader:
# Finding the header:
if rownum == 0:
header = row
header.append("Prefix")
else:
prefix_location = row[3].find(",") + 2 # finds the position of the comma, the prefix starts after the comma and after a space (+2)
prefix = row[3][prefix_location:prefix_location+3] # grabs the 3 first characters of the prefix
if prefix in prefix_list: # if there's a prefix in the passanger's name, it's appended to the row
if prefix == "Mis":
row.append("Miss") # Mis is corrected to Miss on appending, since we must work with 3 chars
else:
row.append(prefix)
else:
row.append("Unknown") # for cases where there's no prefix in the passanger's name
processed_list.append(row)
rownum += 1
# grab_values() will run on all rows and count the number of passengers in each demographic and the number of survivors
def grab_values(col_num,i):
# print col_num, "item name", i
num_on_board = 0
num_survived = 0
for row in processed_list:
if row[col_num] == i:
num_on_board += 1
if row[2] == "1":
num_survived += 1
return num_on_board, num_survived
# get_avg() finds the column number and possible values of demographic x.
def get_avg(x): # x is the category (sex, embarked...)
col_num = details[x][0]
for i in details[x][1:]:
print "Looking for: ", i, "at col num: ", col_num
grab_values(col_num,i)
num_on_board,num_survived = grab_values(col_num,i)
try:
proportion_survived = float(num_survived)/num_on_board
except ZeroDivisionError:
proportion_survived = "Cannot be calculated"
print "Number of %s passengers on board: " %i , num_on_board, "\n" \
"Number of %s passengers survived: " %i, num_survived, "\n" \
"Proportion of %s passengers survived: " %i, "%.2f%%" % (proportion_survived * 100), "\n"
print "Hello! I can calculate the proportion of passengers that survived according to these parameters: \n \
Embarked \n Sex \n Pclass \n Prefix", "\n"
def get_choice():
possible_choices = ["embarked","sex","pclass","prefix"]
choice = raw_input("Please enter your choice: ").lower()
if choice not in possible_choices:
print "Sorry, I can only work with Embarked/Sex/Pclass/Prefix. Please try again."
get_choice()
return choice
user_choice = get_choice()
get_avg(user_choice)
titanic.close()
您排氣的全'reader'對象你曾經運行兩個函數之前,所以裏面'grab_values'環路什麼都不做。你似乎希望在你的第一個循環中將'row'改爲持久化,但實際上你只是在循環中改變一個局部變量,然後把它扔掉。您可能希望將每行附加到新列表中。 – geoffspear
你用什麼前綴?你計算每個前綴的數量還是男性和女性的數量? – stmfunk
@stmfunk我認爲一個很好的人口統計可以通過前綴查看生存比例。基本上這只是一個很好的練習 - 用一些邏輯添加一個從現有變量創建的變量。 – Optimesh