在這樣的應用中,一個經過其值進行優化(在你的情況下,cost
,gamma
和epsilon
)的參數作爲適應度函數,然後運行模型擬合+評估函數,並且使用測量的參數模型表現作爲衡量健身的指標。因此,目標函數的顯式形式並不直接相關。
在下面的實現中,我使用5倍交叉驗證來估計給定參數集的RMSE。特別是,由於包GA
使適應度函數最大化,因此我已經將參數的給定值的適應值寫爲減去交叉驗證數據集上的平均rmse。因此,可達到的最大適應度爲零。
這:
library(e1071)
library(GA)
data(Ozone, package="mlbench")
Data <- na.omit(Ozone)
# Setup the data for cross-validation
K = 5 # 5-fold cross-validation
fold_inds <- sample(1:K, nrow(Data), replace = TRUE)
lst_CV_data <- lapply(1:K, function(i) list(
train_data = Data[fold_inds != i, , drop = FALSE],
test_data = Data[fold_inds == i, , drop = FALSE]))
# Given the values of parameters 'cost', 'gamma' and 'epsilon', return the rmse of the model over the test data
evalParams <- function(train_data, test_data, cost, gamma, epsilon) {
# Train
model <- svm(V4 ~ ., data = train_data, cost = cost, gamma = gamma, epsilon = epsilon, type = "eps-regression", kernel = "radial")
# Test
rmse <- mean((predict(model, newdata = test_data) - test_data$V4)^2)
return (rmse)
}
# Fitness function (to be maximized)
# Parameter vector x is: (cost, gamma, epsilon)
fitnessFunc <- function(x, Lst_CV_Data) {
# Retrieve the SVM parameters
cost_val <- x[1]
gamma_val <- x[2]
epsilon_val <- x[3]
# Use cross-validation to estimate the RMSE for each split of the dataset
rmse_vals <- sapply(Lst_CV_Data, function(in_data) with(in_data,
evalParams(train_data, test_data, cost_val, gamma_val, epsilon_val)))
# As fitness measure, return minus the average rmse (over the cross-validation folds),
# so that by maximizing fitness we are minimizing the rmse
return (-mean(rmse_vals))
}
# Range of the parameter values to be tested
# Parameters are: (cost, gamma, epsilon)
theta_min <- c(cost = 1e-4, gamma = 1e-3, epsilon = 1e-2)
theta_max <- c(cost = 10, gamma = 2, epsilon = 2)
# Run the genetic algorithm
results <- ga(type = "real-valued", fitness = fitnessFunc, lst_CV_data,
names = names(theta_min),
min = theta_min, max = theta_max,
popSize = 50, maxiter = 10)
summary(results)
產生的結果(爲我指定的參數值的範圍,其可以基於所述數據需要微調):
GA results:
Iterations = 100
Fitness function value = -14.66315
Solution =
cost gamma epsilon
[1,] 2.643109 0.07910103 0.09864132
我可以告訴你它對我意味着多少......非常感謝你〜! – jihoon
非常感謝!該代碼正在爲臭氧數據工作。但是,如果我從臭氧數據中刪除了一些行,或者如果我更改了特定列中的數字,則它不起作用,並且它會給出「Forecast.svm中的錯誤(ret,xhold,decision.values = TRUE): Model is empty !」錯誤。我該如何解決這個問題? –