2017-06-12 69 views
0

所以我需要去通過含有對某些視頻遊戲信息的CSV文件,並基於此遊戲的用戶得分的新變量是我的代碼:數據管理和圖形與蟒蛇

#Imports 
import pandas 
import numpy as np 
import matplotlib.pyplot as plt 

data = pandas.read_csv("Data Collections/metacritic_games_2016_11.csv",  encoding='latin-1') 
data['year'] = pandas.DatetimeIndex(data['release']).year 
data = data[data["year"] >= 2000] 

rating = [] 
for index, row in data.iterrows(): 
if row['user_score'] >= 7.5: 
    rating.append("Good") 
elif row['user_score'] >= 6.5: 
    rating.append("Average") 
elif row['user_score'] >= 0: 
    rating.append("Bad") 

data["new_rating"] = pandas.Series(rating) 

year = 2000 
index = 0 
while year != 2016: 
vals = data[data["year"] == year]["new_rating"].value_counts() 
plt.bar(index, vals["Bad"], color='#494953') 
plt.bar(index, vals["Average"], color='#6A7EFC', bottom=vals["Bad"]) 
plt.bar(index, vals["Good"], color='#FF5656', bottom=vals["Average"] + vals["Bad"]) 
index += 1 
year += 1 

plt.show() 

然而,我不斷收到錯誤說:

if row['user_score'] >= 7.5: 
TypeError: '>=' not supported between instances of 'str' and 'float' 

我不知道該怎麼辦。任何幫助表示讚賞

+0

嘗試類型轉換到浮排[「user_score」] –

+0

如果我的回答解決您的問題,請點擊選中標記接受它我的答案的左邊。 –

回答

2

user_score列中的其中一個數字由於某種原因被視爲字符串。假設它不是像"seventeen"值,您可以修復與

data['user_score'] = data['user_score'].astype(float) 

我也建議更換你的代碼來創建你的rating列。取而代之的是:

rating = [] 
for index, row in data.iterrows(): 
if row['user_score'] >= 7.5: 
    rating.append("Good") 
elif row['user_score'] >= 6.5: 
    rating.append("Average") 
elif row['user_score'] >= 0: 
    rating.append("Bad") 

data["new_rating"] = pandas.Series(rating) 

你應該做這樣的事情:

group_boundaries = [0, 6.5, 7.5, inf] 
group_labels = ['bad', 'average', 'good'] 

data['rating'] = pd.cut(data['user_score'], 
         bins = group_boundaries, 
         labels=group_labels) 
+0

謝謝,我現在明白了! –