如何消除「NA/NaN/Inf在外部函數調用（參數7）」運行隨機預測

我已經廣泛地研究了這一點，但沒有找到解決方案。我已經打掃我的數據設置如下：如何消除「NA/NaN/Inf在外部函數調用（參數7）」運行隨機預測

library("raster") 
impute.mean <- function(x) replace(x, is.na(x) | is.nan(x) | is.infinite(x) , 
mean(x, na.rm = TRUE)) 
losses <- apply(losses, 2, impute.mean) 
colSums(is.na(losses)) 
isinf <- function(x) (NA <- is.infinite(x)) 
infout <- apply(losses, 2, is.infinite) 
colSums(infout) 
isnan <- function(x) (NA <- is.nan(x)) 
nanout <- apply(losses, 2, is.nan) 
colSums(nanout)

問題出現運行預測算法：

options(warn=2) 
p <- predict(default.rf, losses, type="prob", inf.rm = TRUE, na.rm=TRUE, nan.rm=TRUE)

所有的研究認爲它應該是NA的或天道酬勤的或NaN的數據，但我不找到任何。我提出的數據和隨機森林總結可供偵探在[刪除] 回溯並沒有透露太多（我反正）：

4: .C("classForest", mdim = as.integer(mdim), ntest = as.integer(ntest), 
     nclass = as.integer(object$forest$nclass), maxcat = as.integer(maxcat), 
     nrnodes = as.integer(nrnodes), jbt = as.integer(ntree), xts = as.double(x), 
     xbestsplit = as.double(object$forest$xbestsplit), pid = object$forest$pid, 
     cutoff = as.double(cutoff), countts = as.double(countts), 
     treemap = as.integer(aperm(object$forest$treemap, c(2, 1, 
      3))), nodestatus = as.integer(object$forest$nodestatus), 
     cat = as.integer(object$forest$ncat), nodepred = as.integer(object$forest$nodepred), 
     treepred = as.integer(treepred), jet = as.integer(numeric(ntest)), 
     bestvar = as.integer(object$forest$bestvar), nodexts = as.integer(nodexts), 
     ndbigtree = as.integer(object$forest$ndbigtree), predict.all = as.integer(predict.all), 
     prox = as.integer(proximity), proxmatrix = as.double(proxmatrix), 
     nodes = as.integer(nodes), DUP = FALSE, PACKAGE = "randomForest") 
3: predict.randomForest(default.rf, losses, type = "prob", inf.rm = TRUE, 
     na.rm = TRUE, nan.rm = TRUE) 
2: predict(default.rf, losses, type = "prob", inf.rm = TRUE, na.rm = TRUE, 
     nan.rm = TRUE) 
1: predict(default.rf, losses, type = "prob", inf.rm = TRUE, na.rm = TRUE, 
     nan.rm = TRUE)

來源

2014-02-23 Elliott

很難說，沒有關於森林本身的更多信息（您的文件只包含數據）。但是我確實想知道'inf.rm'，'na.rm'或'nan.rm'是'predict.randomForest'的參數。它們當然不在文檔中。 – joran

該zip文件包含RF摘要。它不再可用。NA，Inf和NaN是可能阻止RF運行的丟失或不可計算數據的形式。 Nate的答案有效。 – Elliott

我完全瞭解NA，Inf和NaN。我指出那些預測功能根本不存在這些論據。他們完全被忽略。 – joran

您的代碼也不是完全可重複的（沒有實際randomForest的運行算法），但你是而不是用列向量的平均值代替Inf值。這是因爲函數中的na.rm = TRUE參數在impute.mean函數內完全按照它的說法 - 刪除了NA值（而不是Inf）。

你可以看到這一點，例如，通過：

impute.mean <- function(x) replace(x, is.na(x) | is.nan(x) | is.infinite(x), mean(x, na.rm = TRUE)) 
losses <- apply(losses, 2, impute.mean) 
sum(apply(losses, 2, function(.) sum(is.infinite(.)))) 
# [1] 696

爲了擺脫無限值，使用：

錯誤消息的

impute.mean <- function(x) replace(x, is.na(x) | is.nan(x) | is.infinite(x), mean(x[!is.na(x) & !is.nan(x) & !is.infinite(x)])) 
losses <- apply(losses, 2, impute.mean) 
sum(apply(losses, 2, function(.) sum(is.infinite(.)))) 
# [1] 0

來源

2014-02-23 04:50:54

一個原因：

NA/NaN/Inf在外部函數調用中（arg X）

訓練randomForest在您的data.frame中有character級別的變量。如果它帶有警告：

來港受到脅迫介紹

檢查，以確保所有的字符變量的已轉化爲因素。

例

set.seed(1) 
dat <- data.frame(
    a = runif(100), 
    b = rpois(100, 10), 
    c = rep(c("a","b"), 100), 
    stringsAsFactors = FALSE 
) 

library(randomForest) 
randomForest(a ~ ., data = dat)

收率：

錯誤randomForest.default（M，Y，...）：NA/NaN的/ INF在外國函數調用（ARG 1 ）另外：警告消息：在data.matrix（x）中：通過強制引入的NA

但刪除stringsAsFactors = FALSE參數並運行。

來源

2016-01-08 02:30:33

如何消除「NA/NaN/Inf在外部函數調用（參數7）」運行隨機預測

回答

相關問題