2014-12-02 18 views
1

我有一個CSV文件,其中包含以英尺和英寸爲單位的高度以及籃球運動員的體重。但是,一些數據對於腳和高度具有NULL或空白空間。如何統計csv文件中的行並忽略未使用的行(使用python)

我不得不根據他們的身高,體重計算出他們的BMI來得出有多少球員肥胖。

我發現有11名肥胖的球員。但是,我需要找到這組數據中肥胖球員的比例。我很難找出如何找到玩家的總數(忽略那些在他們的行中有NULL或無的玩家)。

這裏是我擁有的數據的一個例子:

firstname lastname \t position firstseason lastseason \t h_feet \t h_inches weight 
 

 
Marquis \t Daniels \t G \t 2003 \t  2009 \t  6 \t  6 \t 200 \t 
 
Predrag \t Danilovic \t G \t 1995 \t  1996 \t  6 \t  5 \t 200 
 
Adrian \t Dantley \t F \t 1976 \t  1990 \t  6 \t  5 \t 208 \t 
 
Mike \t Dantoni  G \t 1975 \t  1975 \t  NULL  NULL  NULL 
 
Henry \t Darcey \t C \t 1952 \t  1952   6 \t  7 \t 217 \t 
 
Jimmy \t Darden \t G \t 1949 \t  1949 \t  6 \t  1 \t 170 \t 
 
Oliver \t Darden \t F \t 1967 \t  1969 \t  6 \t  6.5  235 \t 
 
Yinka \t Dare \t   C \t 1994 \t  1997 \t  7 \t  0 \t 265 \t 
 
Jesse \t Dark \t   G \t 1974 \t  1974 \t  6 \t  4.5  210 \t

你可以看到,某些行有NULL的數據。

到目前爲止,這是我對Python代碼:

def read_csv(filename): 
 
    """ 
 
    Reads a Comma Separated Value file, 
 
    returns a list of rows; each row is a dictionary of columns. 
 
    """ 
 
    with open(filename, encoding="utf_8_sig") as file: 
 
     reader = csv.DictReader(file) 
 
     rows = list(reader) 
 
    return rows 
 

 
# Try out the function 
 
players = read_csv("players.csv") 
 

 
# Print information on the first player, to demonstrate how 
 
# to get to the data 
 
from pprint import pprint 
 

 

 
def is_obese(player): 
 
    if (player["h_inches"] and player["h_feet"] and player["weight"]) == 'NULL' or (player["h_inches"] and player["h_feet"] and player["weight"]) == None: 
 
     pass 
 
    else: 
 
     total_h_inches = float(player["h_feet"]) * 12 + float(player["h_inches"]) 
 
     bmi = (float(player["weight"])/(total_h_inches**2))* 703 
 
     return bmi >= 30 
 
     
 
    
 
count = 0 
 

 
for player in players: 
 
    if is_obese(player): 
 
     print ('player', player["lastname"], 'is obese') 
 
     count = count + 1 
 
    else: 
 
     pass 
 
print ("The total number of obese players:", count)

,並返回:

player Boozer is obese 
 
player Brand is obese 
 
player Catlett is obese 
 
player Davis is obese 
 
player Hamilton is obese 
 
player Lang is obese 
 
player Maxiell is obese 
 
player Miller is obese 
 
player Smith is obese 
 
player Traylor is obese 
 
player White is obese 
 
The total number of obese players: 11

回答

2

保持一個counte r也是玩家總數,只有當玩家有數據時才加入玩家。

# returns True only if player has all data, otherwise returns False 
def has_data(player): 
    return (player["h_inches"] != 'NULL' and 
      player["h_feet"] != 'NULL' and 
      player["weight"] != 'NULL' and 
      player["h_inches"] is not None and 
      player["h_feet"] is not None and 
      player["weight"] is not None) 

obese_count = 0 
total_count = 0 

for player in players: 
    if has_data(player): 
     if is_obese(player): 
      print ('player', player["lastname"], 'is obese') 
      obese_count += 1 
     total_count += 1 
+0

爲什麼'has_data(播放器)'返回的數據,如果是'== 'NULL' 或== None'?因爲python正在返回玩家的**信息,這些玩家的**行有** NULL。我不確定我是否做錯了什麼。 – ComputerHelp 2014-12-02 19:50:37

+0

應該用'!='代替嗎? – ComputerHelp 2014-12-02 19:55:47

+0

我重寫了條件語句以使其更清楚。 'has_data()'將返回True或False,並且只有當玩家擁有所有數據時纔會返回True。因此,您可以使用此函數作爲「if」語句的條件(在檢查播放器是否肥胖或添加到任何計數之前)。希望有所幫助。 – 101 2014-12-03 00:32:56

1

添加count_total

count_total = 0 
def is_obese(player): 
if (player["h_inches"] and player["h_feet"] and player["weight"]) == 'NULL' or (player["h_inches"] and player["h_feet"] and player["weight"]) == None: 
    pass 
else: 
    count_total +=1 # to count number of playes without NULL values 
    total_h_inches = float(player["h_feet"]) * 12 + float(player["h_inches"]) 
    bmi = (float(player["weight"])/(total_h_inches**2))* 703 
    return bmi >= 30 

,並在最後:

print("{} of Player obesed {}%".format("Percentage",(count/float(count_total))*100))) 
相關問題