2
如何從spark中調用包來利用R進行數據操作?SparkR和軟件包
例子中,我試圖訪問我的HDFS test.csv如下
Sys.setenv(SPARK_HOME="/opt/spark14")
library(SparkR)
sc <- sparkR.init(master="local")
sqlContext <- sparkRSQL.init(sc)
flights <- read.df(sqlContext,"hdfs://sandbox.hortonWorks.com:8020 /user/root/test.csv","com.databricks.spark.csv", header="true")
,但得到如下錯誤:
Caused by: java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv
我試圖通過以下選項
加載CSV包Sys.setenv('SPARKR_SUBMIT_ARGS'='--packages com.databricks:spark-csv_2.10:1.0.3')
但加載sqlContext時出現以下錯誤
Launching java with spark-submit command /opt/spark14/bin/spark-submit --packages com.databricks:spark-csv_2.10:1.0.3 /tmp/RtmpuvwOky /backend_port95332e5267b
Error: Cannot load main class from JAR file:/tmp/RtmpuvwOky/backend_port95332e5267b
任何幫助將不勝感激。
Thanks Holden.I正在嘗試使用Rstudio.I試圖識別與兩個選項關聯的語法,即向SPARKR_SUBMIT_ARGS中添加spark-shell並運行--packages命令。這些語法? – san71
設置SPARKR_SUBMIT_ARGS的正確方法是做類似'Sys.setenv(SPARKR_SUBMIT_ARGS =「 - packages com.databricks:spark-csv_2.10:1.0.3 sparkr-shell」)''。我有一個公共關係公開將其添加到官方文檔https://github.com/apache/spark/pull/6916 –
謝謝Shivaram.This工程! – san71