4
我正在學習使用glmnet
和brnn
包。考慮以下代碼:如何自動化glmnet中的變量選擇和交叉驗證
library(RODBC)
library(brnn)
library(glmnet)
memory.limit(size = 4000)
z <-odbcConnect("mydb") # database with Access queries and tables
# import the data
f5 <- sqlFetch(z,"my_qry")
# head(f5)
# check for 'NA'
sum(is.na(f5))
# choose a 'locn', up to 16 of variable 'locn' are present
f6 <- subset(f5, locn == "mm")
# dim(f6)
# use glmnet to identify possible iv's
training_xnm <- f6[,1:52] # training data
xnm <- as.matrix(training_xnm)
y <- f6[,54] # response
fit.nm <- glmnet(xnm,y, family="binomial", alpha=0.6, nlambda=1000,standardize=TRUE,maxit=100000)
# print(fit.nm)
# cross validation for glmnet to determine a good lambda value
cv.fit.nm <- cv.glmnet(xnm, y)
# have a look at the 'min' and '1se' lambda values
cv.fit.nm$lambda.min
cv.fit.nm$lambda.1se
# returned $lambda.min of 0.002906279, $lambda.1se of 2.587214
# for testing purposes I choose a value between 'min' and '1se'
mid.lambda.nm = (cv.fit.nm$lambda.min + cv.fit.nm$lambda.1se)/2
print(coef(fit.nm, s = mid.lambda.nm)) # 8 iv's retained
# I then manually inspect the data frame and enter the column index for each of the iv's
# these iv's will be the input to my 'brnn' neural nets
cols <- c(1, 3, 6, 8, 11, 20, 25, 38) # column indices of useful iv's
# brnn creation: only one shown but this step will be repeated
# take a 85% sample from data frame
ridxs <- sample(1:nrow(f6), floor(0.85*nrow(f6))) # row id's
f6train <- f6[ridxs,] # the resultant data frame of 85%
f6train <-f6train[,cols] # 'cols' as chosen above
# For the 'brnn' phase response is a binary value, 'fin'
# and predictors are the 8 iv's found earlier
out = brnn(fin ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8, data=f6train, neurons=3,normalize=TRUE, epochs=500, verbose=FALSE)
#summary(out)
# see how well the net predicts the training cases
pred <- predict(out)
上述腳本運行正常。
我的問題是:我如何自動運行上述腳本以運行locn
的不同值,這基本上我怎麼能概括得到步驟:cols <- c(1, 3, 6, 8, 11, 20, 25, 38) # column indices of useful iv's
。目前我可以手動做到這一點,但不能看到如何做到這一點在中locn
不同值的一般方式,例如
locn.list <- c("am", "bm", "cm", "dm", "em")
for(j in 1:5) {
this.locn <- locn.list[j]
# run the above script
}
它看起來沒有任何測試數據是可能的,但你應該馬上知道,使用「(」在令牌之後讓R尋找這個名字的函數可能希望'locn.list [j]' 。'j <-1'這一行看起來完全是多餘的。 –
感謝您的評論迪文:我的壞,錯字,是的我同意j < - 1是多餘的! –
感謝評論迪文:我的壞,錯字和是的我同意j < - 1是多餘的!正如我所提到的那樣,運行代碼沒有問題,我的問題是如何在交叉驗證後推廣glmnet中有用變量的集合。目前,我每天使用代碼多次使用實時財務數據對於'locn'的一個值,我可以爲'locn'的所有17個值創建一個單獨的腳本,並且將它們串聯起來,但是我希望捕獲行的開頭:cols < - c(1,......以編程方式而不是手動輸入把這一行放在每個'locn'中。 –