我無法使插入符號正常工作。從已知的例子開始,http://machinelearningmastery.com/feature-selection-with-the-caret-r-package/中的示例完美無瑕。rfe.defau中的R錯誤:「在x和y中應該有相同數量的樣本」
正如我雖然替代我自己的數據集,它失敗:
> results <- rfe(x, y, sizes=c(1:5), rfeControl=control)
Error in rfe.default(x, y, sizes = c(1:5), rfeControl = control) :
there should be the same number of samples in x and y
據我所知,在X和Y樣品的行數是相同的;
> nrow(x)
[1] 691231
> nrow(y)
[1] 691231
詳見下文。
我看過類似的問題,如R rfe function "caret" Package error: there should be the same number of samples in x and y和R trying to get caret/rfe to work。後者是相關的,但似乎沒有幫助。我試着將我的y以矢量像
> y <- as.vector(y)
或
> y <- as.vector(as.list(y))
但錯誤依然存在。 當然,我做了一些愚蠢的事情,我只是看不到我犯錯的地方。任何幫助表示讚賞。
:-)
YARC
----------------------細節----------- ---
------腳本--------
library(feather)
library(mlbench)
library(caret)
path <- "faultclass.feather"
df <- read_feather(path)
set.seed(7)
control <- rfeControl(functions=rfFuncs, method="cv", number=10)
x <- subset(df,select=-c(fault))
y <- df["fault"]*1
results <- rfe(x, y, sizes=c(1:5), rfeControl=control)
print(results)
predictors(results)
plot(results, type=c("g", "o"))
------特點------
> str(x)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 691231 obs. of 31 variables:
$ A : chr "2011-12-06 00:00:00" "2011-03-11 00:00:00" "2014-11-17 00:00:00" "2013-01-07 15:19:02" ...
$ B : num 6 6 11 11 6 6 6 6 6 6 ...
$ C : num NA NA NA NA NA NA NA NA NA NA ...
$ D : chr "2016-01-01 00:00:00" "2016-01-01 00:00:00" "2016-01-01 00:00:00" "2016-01-01 00:00:00" ...
$ E : chr NA NA NA NA ...
$ F : num 0 230 230 230 230 230 230 230 230 0 ...
$ G : num 13 35 38 128 12 6 10 4 2 6 ...
$ H : chr NA NA NA NA ...
$ J : chr "35" "35" "28" "34" ...
$ K : num 0 63 32 63 40 40 35 40 35 25 ...
$ L : num 3 3 3 3 3 3 3 2 2 2 ...
$ M : num 301 301 301 301 301 301 301 301 301 301 ...
$ N : chr "613.0" "9630.0" "9114.0" "600.0" ...
$ O : chr "000356039" "000664676" "000770082" "000617804" ...
$ P : chr "11610000" "0000003001" "1161000" "43850" ...
$ Q : num 10089 10089 10972 27629 27630 ...
$ R : num 7.07e+17 7.07e+17 7.07e+17 7.07e+17 7.07e+17 ...
$ S : num 1 1 1 1 1 1 1 1 1 1 ...
$ T : chr "XX" "XX" "809" "96" ...
$ U : chr "cac" "edr" "ssr" "nsk" ...
$ V : chr "1954-05-17 00:00:00" "1973-05-17 00:00:00" "1997-06-24 00:00:00" "1976-12-24 00:00:00" ...
$ W : num 287 287 287 665 664 664 664 664 664 664 ...
$ X : num 1 1 1 1 1 1 1 1 1 1 ...
$ Y : num NA NA NA NA NA NA NA NA NA NA ...
$ Z : num 24828 39591 8932 35162 28540 ...
$ AA : chr "0001" "0001" "0001" "0002" ...
$ AB : chr "0001-TRA" "0001-TRB" "0001-TRC" "0002-TRD" ...
$ AC : chr "0,230" "0,230" "0,230" "0,230" ...
$ AD : chr "K03" "K03" "K03" "K05" ...
$ AE : num 3 3 3 3 3 3 3 3 3 3 ...
$ AF : chr "IT" "IT" "IT" "IT" ...
> str(y)
'data.frame': 691231 obs. of 1 variable:
$ fault: num 0 0 0 0 0 0 0 0 0 0 ...