2017-04-26 21 views
0

。警告消息: glm.fit:然而,當試圖用邏輯迴歸模型,我遇到下面的警告信息的算法沒有收斂。此外,似乎這些預測根本不起作用(不是從原來的Y變量(make or miss)改變而來)。我將在下面提供我的代碼。我從這裏得到的數據:Shot Data.在NBA Logistic迴歸拍我使用NBA打出的數據和我在嘗試使用不同的迴歸技術拍攝的預測模型數據

nba_shots <- read.csv("shot_logs.csv") 
library(dplyr) 
library(ggplot2) 
library(data.table) 
library("caTools") 
library(glmnet) 
library(caret) 

nba_shots_clean <- data.frame("game_id" = nba_shots$GAME_ID, "location" = 
nba_shots$LOCATION, "shot_number" = nba_shots$SHOT_NUMBER, 
        "closest_defender" = nba_shots$CLOSEST_DEFENDER, 
        "defender_distance" = nba_shots$CLOSE_DEF_DIST, "points" = nba_shots$PTS, 
        "player_name" = nba_shots$player_name, "dribbles" = nba_shots$DRIBBLES, 
        "shot_clock" = nba_shots$SHOT_CLOCK, "quarter" = nba_shots$PERIOD, 
        "touch_time" = nba_shots$TOUCH_TIME, "game_result" = nba_shots$W 
        , "FGM" = nba_shots$FGM) 

mean(nba_shots_clean$shot_clock) # NA 
# this gave NA return which means that there are NAs in this column that we 
# need to clean up 
# if the shot clock was NA I assume that this means it was the end of a 
# quarter and the shot clock was off. 
# For now I'm going to just set all of these NAs equal to zero, so all zeros 
# mean it is the end of a quarter 
# checking the amount of NAs 
last_shots <- nba_shots_clean[is.na(nba_shots_clean$shot_clock),] 
nrow(last_shots) # this tells me there is 5567 shots taken when the shot 
# clock was turned off at the end of a quarter 
# setting these NAs equal to zero 
nba_shots_clean[is.na(nba_shots_clean)] <- 0 
# checking to see if it worked 
nrow(nba_shots_clean[is.na(nba_shots_clean$shot_clock),]) # it worked 

# create a test and train set 
split = sample.split(nba_shots_clean, SplitRatio=0.75) 
nbaTrain = subset(nba_shots_clean, split==TRUE) 
nbaTest = subset(nba_shots_clean, split==FALSE) 
# logistic regression 
nbaLogitModel <- glm(FGM ~ location + shot_number + defender_distance + 
points + dribbles + shot_clock + quarter + touch_time, data=nbaTrain, 
family="binomial", na.action = na.omit) 

nbaPredict = predict(nbaLogitModel, newdata=nbaTest, type="response") 
cm = table(nbaTest$FGM, nbaPredict > 0.5) 
print(cm) 

這給了我下面的輸出,它告訴我的預測沒有做任何事情,因爲它是和以前一樣。

FALSE TRUE 
0 21428  0 
1 0 17977 

我真的很感謝任何指導。

+1

嘗試讀取此:https://stats.stackexchange.com/questions/5354/logistic-regression-model-does-not-converge – staove7

+1

嘗試提供最小[再現的示例](HTTP:/帶有樣本輸入數據的/stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)。如果我們無法運行代碼,那麼要幫助你很難。 – MrFlick

+0

@MrFlick通過鏈接提供的csv文件不夠好? – Chris95

回答

2

模型(模型預測與nbaTest$FGM)的混淆矩陣告訴你,你的模型具有準確率100%!
這是由於在數據集中的points變量這是完全相關的因變量:

table(nba_shots_clean$points, nba_shots_clean$FGM) 
     0  1 
    0 87278  0 
    2  0 58692 
    3  0 15133 

嘗試從模型中刪除points:現在

# create a test and train set 
set.seed(1234) 
split = sample.split(nba_shots_clean, SplitRatio=0.75) 
nbaTrain = subset(nba_shots_clean, split==TRUE) 
nbaTest = subset(nba_shots_clean, split==FALSE) 

# logistic regression 
nbaLogitModel <- glm(FGM ~ location + shot_number + defender_distance + 
dribbles + shot_clock + quarter + touch_time, data=nbaTrain, 
family="binomial", na.action = na.omit) 
summary(nbaLogitModel) 

任何警告信息和估計型號爲:

Call: 
glm(formula = FGM ~ location + shot_number + defender_distance + 
    dribbles + shot_clock + quarter + touch_time, family = "binomial", 
    data = nbaTrain, na.action = na.omit) 

Deviance Residuals: 
    Min  1Q Median  3Q  Max 
-3.8995 -1.1072 -0.9743 1.2284 1.6799 

Coefficients: 
        Estimate Std. Error z value  Pr(>|z|)  
(Intercept)  -0.427688 0.025446 -16.808  < 2e-16 *** 
locationH   0.037920 0.012091 3.136  0.00171 ** 
shot_number  0.007972 0.001722 4.630 0.000003656291 *** 
defender_distance -0.006990 0.002242 -3.117  0.00182 ** 
dribbles   0.010582 0.004859 2.178  0.02941 * 
shot_clock   0.032759 0.001083 30.244  < 2e-16 *** 
quarter   -0.043100 0.007045 -6.118 0.000000000946 *** 
touch_time  -0.038006 0.005700 -6.668 0.000000000026 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1) 

    Null deviance: 153850 on 111532 degrees of freedom 
Residual deviance: 152529 on 111525 degrees of freedom 
AIC: 152545 

Number of Fisher Scoring iterations: 4 

混淆矩陣爲:

nbaPredict = predict(nbaLogitModel, newdata=nbaTest, type="response") 
cm = table(nbaTest$FGM, nbaPredict > 0.5) 
print(cm) 

    FALSE TRUE 
0 21554 5335 
1 16726 5955 
+0

非常感謝,不敢相信我錯過了!投票! – Chris95

相關問題