我一直遇到頑固的錯誤,試圖使用glmnet stats package如下所示。glmnet錯誤 - 不符合參數
我試過列出的有限建議here(包括將數據設置爲data.matrix)。我也試圖使用?glmnet中描述的「penalty.box」設置,但沒有任何積極的結果。
df = structure(list(term = c(0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), int_rate = c(10.65, 15.27, 15.96, 13.49, 12.69, 7.9, 15.96, 18.64, 21.28, 12.69, 14.65, 12.69, 13.49, 9.91, 10.65, 16.29, 15.27, 6.03, 11.71, 6.03, 15.27, 12.42, 11.71, 11.71, 11.71, 9.91, 16.77, 11.71, 11.71, 7.51, 7.9, 15.96, 8.9, 15.96, 10.65, 9.91, 7.9, 12.42, 12.69, 7.51, 7.9, 18.25, 16.77, 6.03, 9.91, 8.9, 10.65, 6.03, 6.62, 9.91), emp_length = c(NA, 1, NA, NA, 1, 3, 8, 9, 4, 1, 5, NA, 1, 3, 3, 1, 4, NA, 1, 6, 3, NA, NA, 5, 1, 2, 2, NA, 1, 7, 5, 2, 2, 7, NA, 2, 1, 1, 1, 4, NA, 9, NA, NA, 6, NA, 6, NA, 5, 8), annual_inc = c(24000, 30000, 12252, 49200, 80000, 36000, 47004, 48000, 40000, 15000, 72000, 75000, 30000, 15000, 1e+05, 28000, 42000, 110000, 84000, 77385.19, 43370, 105000, 50000, 50000, 76000, 92000, 50004, 106000, 25000, 17108, 75000, 29120, 24044, 34000, 41000, 55596, 45000, 36852, 27000, 68004, 62300, 65000, 55000, 45600, 0000, 1e+05, 27000, 60000, 70000, 80000), delinq_2yrs = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 3L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), inq_last_6mths = c(1L, 5L, 2L, 1L, 0L, 3L, 1L, 2L, 2L, 0L, 2L, 0L, 1L, 2L, 2L, 1L, 2L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 2L, 0L, 0L, 1L, 3L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 2L), outcome = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("term", "int_rate", "emp_length", "annual_inc", "delinq_2yrs", "inq_last_6mths", "outcome"), row.names = c(NA, 50L), class = "data.frame")
X = select(df, -outcome)
Y = df$outcome
X_train = as.matrix(X[1:50,])
Y_train = as.matrix(Y[1:50])
library(glmnet)
model = glmnet(X_train, Y_train, family = "binomial")
summary(model)
這裏的錯誤:
Error in drop(y %*% rep(1, nc)) :
error in evaluating the argument 'x' in selecting a method for
function 'drop': Error in y %*% rep(1, nc) : non-conformable arguments
實際數據集110個變量和〜1毫米觀察,但部分數據集以上是生產同樣的問題。
對此調試方法的任何建議?
懷疑你不應該做'dep_var = data.matrix(train $ outcome)'。更可能的是使用'dep_var = train $ outcome'。還提供了具有'select'的包的庫調用。可能有一個函數允許在未加引號的列名前加上負號,但看起來非常「不標準」。 –
我早些時候嘗試過這個解決方案,但是這並沒有解決問題。現在我已經添加了可重現答案的任何線索? – AME
還添加了選擇功能的dplyr軟件包調用 – AME