2017-09-22 277 views
0

我正在做一個Jupyter筆記本分析一些數據,看起來像這樣:Jupyter筆記本 - Python代碼

The Data I'm analyzing

我必須找出以下信息:

The Questions

這是我嘗試過的,但它不起作用,而且我完全喪失瞭如何去做b部分。

# Import relevant packages/modules 
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
from scipy import stats 

# Import relevant csv data file 
data = pd.read_csv("C:/Users/Hanna/Desktop/Sheridan College/Statistics for Data Science/Assignment 1/MATH37198_Assignment1_Individual/IGN_game_ratings.csv") 

# Part a: Determine the z-score of "Super Mario Kart" and print out result 
superMarioKart_zscore = data[data['Game']=='Super Mario Kart'] ['Score'].stats.zscore() 
print("Z-score of Super Mario Kart: ", superMarioKart_zscore) 

# Part b: The top 20 (most common) platforms 

# Part c: The average score of all the Shooter games 
averageShooterScore = data[data['Group']=='Game']['Score'].mean() 
# Print output 
print("The average score of all the Shooter games is: ", averageShooterScore) 

# Part d: The top two platforms witht the most perfect scores (10) 

# Part e: The probability of a game randomly selected that is an RPG 
# First find the number of games in the list that is an RPG 
numOfRPGGames = 0 
for game in data['Game']: 
    if data['Genre'] == 'RPG': 
     numOfRPGGames += 1 
# Divide this by the total number of games to find the probablility of selecting one 
print("The probability of selecting a game that is an RPG is: ", numOFRPGGames/totalNumGames) 

# Part f: The probability of a game randomly selected with a score less than 5 
# First find the number of games in the list with a score less than 5 using a for loop: 
numScoresLessThan5 = 0 
for game in data['Game']: 
    if data['Score'] < 5: 
     numScoresLessThan5 += 1 
# Divide this by the total number of games to find the probablility of selecting one 
print("The probability of selecting a game with a score less than 5 is: ", numScoresLessThan5/totalNumGames) 
+2

您可能要分解這個問題到個人問題,你這樣可能會得到更好的答案。如果你還沒有注意到[MCVE](https://stackoverflow.com/help/mcve),並且真的關注一個特定的問題,你嘗試了什麼,爲什麼它不工作以及你期望的輸出成爲。 – johnchase

回答

0

熊貓有這種類型的問題的優秀的內置功能。以下是使用從CSV導入的一些測試數據解決b部分的建議。我使用的test.csv剛剛這些領域,但仍然在你的工作的情況下更改列名和導入新文件

Sample CSV structure

# Import relevant packages/modules 
import numpy as np 
import pandas as pd 

# Import a dummy csv data file 
data = pd.read_csv("./test.csv") 
# Visualize the file before the process 
print(data) 

# Extract the column you're interesting in counting 
initial_column = data['Name'] 

# Create object for receiving the output of the value_counts function 
count_object = pd.value_counts(initial_column) 

# Create an empty list for receiving the sorted values 
sorted_grouped_column = [] 

# You determine the number of items. In your exercise is 20. 
number_of_items = 3 
counter = 0 

for i in count_object.keys(): 
    if counter == number_of_items: 
     break 
    else: 
     sorted_grouped_column.append(i) 
     counter = counter + 1 

print(sorted_grouped_column)