2013-08-21 107 views
1

我試圖在R中執行隨機森林迴歸,並遇到了幾個問題,並已修復其中的大部分問題,但我無法繞過這最後一個問題。 我有一個我希望閱讀的文件列表,這是沒有問題的(我使用for循環)。for循環將公式中的變量名添加到

library(randomForest) 
set.seed(51) 

file<- c("file1","file2","file3") 
targets<- c("X1.ts","ts2","ts3") 

for (i in 1:length(file)){ 
d_names<-paste("C:\\location\folder\",drugs[i],".txt",sep="") 
dataset<- read.table(d_names, header=TRUE, row.names=1) 
ind<-sample(2,nrow(dataset), replace=TRUE) 

#TRAINING DATASET1 PREDICTING DATASET2 
train_one.rf<- randomForest(dataset[ind==1,][[1]] ~ .-targets[i], data=dataset[ind==1,], prob=c(0.7,0.3)) 
dset2.pred<- predict(train_one.rf, newdata=dataset[ind==2,]) 

#TRAINING DATASET2 PREDICTING DATASET1 
train_two.rf<- randomForest(dataset[ind==2,][[1]] ~ .-targets[i], data=dataset[ind==2,], prob=c(0.7,0.3)) 
dset1.pred<- predict(train_two.rf, newdata=dataset[ind==1,]) 

} 

隨機森林的本質是,我必須對數據進行建模,排除我希望預測的列。這樣做,我必須使用:

dataset[ind==1,][[1]] ~ .-target[i] 

這是目標[I]我希望添加的名稱列的(來自目標)的隨機森林的每次運行。我曾嘗試將它分配給一個變量,並且還將循環變量也作爲子變量,但無濟於事。我猜R中的公式部分需要一些比我更優雅的知識。

Thnaks提前,

Jcrow

+0

什麼是目標?所有數據都是相同的列嗎? – Metrics

+0

目標是在讀入的每個文件中定義的一列。它似乎讀取列名稱,但是如果我想自動執行該過程,則需要讀取每個文件並將列名稱與該文件關聯 – Jcrow06

+0

As far據我所知,你正在閱讀每個文件,並在每個文件中使用目標的第一個元素,然後是目標的第二個元素,然後是目標的第三個元素,對吧?我下面的代碼是針對每個文件和目標的第一個元素。如果你想爲每個文件的每個元素,你可以很容易地修改下面的代碼。但是,在我這樣做之前,請讓我知道這是你正在尋找的是什麼? – Metrics

回答

1

下面是使用分成兩個數據集作爲data1和data2的所述mtcars數據的解決方案。 (當與R for loop這裏)

data1<-mtcars[1:15,] 
data2<-mtcars[16:nrow(mtcars),] 
mydata<-list(data1,data2) 

targets<-list("mpg~.","cyl~.") 

Map(function(x) Map(function(y) randomForest(as.formula(y),data=x,importance=TRUE,proximity=TRUE), targets),mydata) 

[[1]] 
[[1]][[1]] 

Call: 
randomForest(formula = as.formula(y), data = x, importance = TRUE,  proximity = TRUE) 
       Type of random forest: regression 
        Number of trees: 500 
No. of variables tried at each split: 3 

      Mean of squared residuals: 4.637522 
        % Var explained: 63.98 

[[1]][[2]] 

Call: 
randomForest(formula = as.formula(y), data = x, importance = TRUE,  proximity = TRUE) 
       Type of random forest: regression 
        Number of trees: 500 
No. of variables tried at each split: 3 

      Mean of squared residuals: 0.2455641 
        % Var explained: 89.04 


[[2]] 
[[2]][[1]] 

Call: 
randomForest(formula = as.formula(y), data = x, importance = TRUE,  proximity = TRUE) 
       Type of random forest: regression 
        Number of trees: 500 
No. of variables tried at each split: 3 

      Mean of squared residuals: 10.90303 
        % Var explained: 78.93 

[[2]][[2]] 

Call: 
randomForest(formula = as.formula(y), data = x, importance = TRUE,  proximity = TRUE) 
       Type of random forest: regression 
        Number of trees: 500 
No. of variables tried at each split: 3 

      Mean of squared residuals: 0.1623937 
        % Var explained: 95.69 


Warning messages: 
1: In randomForest.default(m, y, ...) : 
    The response has five or fewer unique values. Are you sure you want to do regression? 
2: In randomForest.default(m, y, ...) : 
    The response has five or fewer unique values. Are you sure you want to do regression? 

:內Map功能重複對的目標不同元件的迴歸而外Map功能重複對MYDATA的不同元件的迴歸。