在SparkR中使用插入符號？

也許有點類似於this question，似乎SparkR數據框與插入程序包不兼容。在SparkR中使用插入符號？

當我嘗試訓練我的模型，我得到以下錯誤：

Error in as.data.frame.default(data) : 
    cannot coerce class "structure("SparkDataFrame", package = "SparkR")" to a data.frame

有沒有解決這個辦法嗎？下面是一個可重複使用的例子：使用虹膜：

#load libraries 
library(caret) 
library(randomForest) 
set.seed(42) 

#point R session to Spark 
Sys.setenv(SPARK_HOME = "your/spark/installation/here") 
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) 

#load SparkR 
library(SparkR) 

#initialize Spark context 
sc <- sparkR.init(master = "local",sparkEnvir = list(spark.driver.memory="2g")) 

#initialize SQL context 
sqlContext <- sparkRSQL.init(sc) 

train2 <- createDataFrame(sqlContext, iris) 

#train the model 
model <- train(Species ~ Sepal_Length + Petal_Length, 
       data = train2, 
       method = "rf", 
       trControl = trainControl(method = "cv", number = 5) 

)

再次，任何解決方法？如果不是，用SparkR進行機器學習最直接的途徑是什麼？

來源

2017-04-24 skathan

這是不可能的。 – mtoto

@moto我認爲我已經發現了這一點 - 但是用SparkR實現機器學習有哪些替代方案？有沒有？ – skathan

是的：http://spark.apache.org/docs/latest/sparkr.html#machine-learning – mtoto

你不能在SparkDataFrames上使用caret的訓練方法，就像你發現的那樣。但是，您可以使用Spark-ml的算法，例如訓練隨機森林分類，使用SparkR::spark.randomForest：

#train the model 
model <- spark.randomForest(train2, 
          type="classification", 
          Species ~ Sepal_Length + Petal_Length, 
          maxDepth = 5, 
          numTrees = 100) 

summary(model)

來源

2017-04-26 08:51:32

在SparkR中使用插入符號？

回答

相關問題