R randomWorest分類

我想用randomForest做分類，但我反覆得到一個似乎沒有明顯解決方案的錯誤消息（randomForest在過去對我進行迴歸工作的很好）。下面粘貼了我的代碼。 '成功'是一個因素，所有因變量都是數字。有關如何正確運行此分類的任何建議？R randomWorest分類

> rf_model<-randomForest(success~.,data=data.train,xtest=data.test[,2:9],ytest=data.test[,1],importance=TRUE,proximity=TRUE) 

Error in randomForest.default(m, y, ...) : 
    NA/NaN/Inf in foreign function call (arg 1)

也，這裏是該數據集的樣本：

head(data)

success duration goal reward_count updates_count comments_count backers_count  min_reward_level max_reward_level 
True 20.00000 1500   10   14    2   68    1    1000 
True 30.00000 3000   10    4    3   48    5    1000 
True 24.40323 14000   23    6    10   540    5    1250 
True 31.95833 30000   9   17    7   173    1   10000 
True 28.13211 4000   10   23    97   2936    10    550 
True 30.00000 6000   16   16   130   2043    25    500

來源

2013-01-03 user1799242

沒有一個完全可重複的例子，沒有。至少，我會（1）檢查數據中是否沒有NA值，並且（2）運行'traceback（）'以查看是否可以獲得有關錯誤發生位置的更詳細信息。 – joran

嘗試將「成功」值更改爲物種名稱而不是「真」。你能告訴我們srt（數據）的輸出嗎？ –

看來你已經接受了一個答案;我遇到了這個問題，發現對於分類來說，這是因爲我的響應變量是'chr'類。要麼執行'data $ var < - as.factor（data $ var）'，要麼使用'randomForest（as.factor（data $ var）〜。，...）'預測'爲我解決了這個問題。 – Hendy

你嘗試在相同的數據的迴歸？如果不是，則檢查數據中的「Inf」值，並在刪除NAs和NaN後嘗試刪除它，如果有的話。你可以找到關於從下面取出天道酬勤有用的信息，

R is there a way to find Inf/-Inf values?

例，

Class V1 V2 V3 V4 V5 V6 V7 V8 V9 
1 11 Inf 4 232 23 2 2 34 0.205567767 
1 11 123 4 232 23 1 2 34 0.162357601 
1 13 123 4 232 23 1 2 34 -0.002739357 
1 13 123 4 232 23 1 2 34 0.186989878 
2 67 14 4 232 67 1 2 34 0.109398677 
2 67 14 4 232 67 2 2 34 0.18491187 
2 67 14 4 232 34 2 2 34 0.098728256 
2 44 769.03 4 21 34 2 2 34 0.204405869 
2 44 34 4 11 34 1 2 34 0.218426408 

# When Classification was performed, following error pops out. 
rf_model<-randomForest(as.factor(Class)~.,data=data,importance=TRUE,proximity=TRUE) 
Error in randomForest.default(m, y, ...) : 
NA/NaN/Inf in foreign function call (arg 1) 

# Regression was performed, following error pops out. 
rf_model<-randomForest(Class~.,data=data,importance=TRUE,proximity=TRUE) 
Error in randomForest.default(m, y, ...) : 
NA/NaN/Inf in foreign function call (arg 1)

所以，請仔細檢查您的數據。此外：警告消息：在randomForest.default（m，y，...）中：響應具有五個或更少的唯一值。你確定要做迴歸嗎？

來源

2013-01-04 06:49:52

這是因爲你的一個變量有超過32個級別。級別表示一個變量的不同值。刪除該變量並重試。

來源

2013-02-11 15:07:16 user2061730

除了存在NAs等顯而易見的事實之外，此錯誤幾乎總是由數據集中字符要素類型的存在引起的。理解這一點的方法是通過考慮隨機森林真正做什麼。您正在按功能劃分數據集功能。所以如果其中一個特徵是Character矢量，那麼你將如何分割數據集？您需要類別來分區數據。多少「男性」與「女性」 - 類別...

對於像年齡或價格這樣的數字特徵，您可以通過分段創建類別;大於特定年齡，小於特定價格等。您不能用純字符功能來完成此操作。因此，您需要將它們作爲數據集中的因素。

來源

2015-06-26 03:24:28 Kingz

在一般情況下，有你收到此錯誤消息2個主要原因：

如果數據幀包含一個字符向量列，而不是因素。只需將您的字符列轉換爲一個因子

2.如果數據包含錯誤值，則應用隨機森林也會生成此錯誤。該頭將不顯示離羣值。對於前：

x = rep(x = sample(c(0,1)), times = 24)

y = c(sample.int(n=50,size = 40),Inf,Inf) 

df = data.frame(col1 = x , col2 = y) 

head(df) 
    col1 col2 
> 1 1 26 
> 2 0 33 
> 3 1 23 
> 4 0 21 
> 5 1 45 
> 6 0 27

現在的DF應用隨機森林會導致同樣的錯誤：

model = randomForest(data = df , col2 ~ col1 , ntree = 10)

Error in randomForest.default(m, y, ...) : NA/NaN/Inf in foreign function call (arg 2)

解決方案：讓我們找出在DF壞值。如上所述，is.finite（）方法檢查輸入向量是否包含適當的有限值。對於前：

is.finite(c(5,6,1000000,NaN,Inf))
[1] TRUE TRUE TRUE FALSE FALSE

現在讓我們來識別含有錯誤值的列在我們的數據幀，並計算它們。

sum(!is.finite(as.vector(df[,names(df) %in% c("col2")])))
[1] 4
sum(!is.finite(as.vector(df[,names(df) %in% c("col1")])))
[1] 0

讓我們刪除這些記錄，並採取只取好記錄：

df1 =df[is.finite(as.vector(df[,names(df) %in% c("col2")])) &
is.finite(as.vector(df[,names(df) %in% c("col1")])) , ]

並再次運行隨機森林：

model1 = randomForest(data = df1 , col2 ~ col1 , ntree = 10)
Call:
randomForest(formula = col2 ~ col1, data = df1, ntree = 10)

來源

2016-02-24 12:40:01

通過簡單地將所有列因素，你可以避免這個錯誤。即使我正面臨這個錯誤。該列，特別是沒有被轉換爲因子。我專門爲此寫了一個因子。最後我的代碼工作。

來源

2016-07-04 17:48:18

R randomWorest分類

回答

相關問題