2016-09-25 45 views
4

我的火花[R 1.6代碼不spark2.0工作,我做了必要的修改,如創建sparkr.session()而不是sparkr.init(),而不是通過sqlcontext參數等等sparkr 2.0 read.df拋出路徑不存在錯誤

在下面我加載從幾個文件夾中的數據到數據幀

read.df在spark1.6的作品

sales <- read.df(sqlContext, path= "gs://dev.appspot.com/myData/2014/20*,gs://dev.appspot.com/myData/2015/20*", source = "com.databricks.spark.csv", delimiter 
="\t") 

代碼read.df在spark2.0不起作用

sales <- read.df("gs://dev.appspot.com/myData/2014/20*,gs://dev.appspot.c 
om/myData/2015/20*", source = "com.databricks.spark.csv", delimiter="\t") 

上述行拋出以下錯誤:

6/09/25 19:28:52 ERROR org.apache.spark.api.r.RBackendHandler: loadDF on org.apache.spark.sql.api.r.SQLUtils faile d Error in invokeJava(isStatic = TRUE, className, methodName, ...) : org.apache.spark.sql.AnalysisException: **Path does not exist: gs://dev.appspot.com/myData/2014/ 20*,gs://dev.appspot.com/myData/2015/20***; 
     at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:357) 
     at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:350) 
     at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) 
     at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) 
     at scala.collection.immutable.List.foreach(List.scala:381) 
     at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) 
     at scala.collection.immutable.List.flatMap(List.scala:344) 
     at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) 
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149) 
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122 Calls: read.df -> dispatchFunc -> f -> callJStatic -> invokeJava Execution halted 16/09/25 19:28:53 INFO org.spark_project.jetty.server.ServerConnector: Stopped [email protected]{HTTP/1.1}{0 .0.0.0:4040} 

回答

1

spark2.0 read.df未能在讀取文件有 「」(逗號)的文件名。

我生成的數據文件在 文件名稱的逗號,像這樣的201448-0,004 201448-0,005 201448-0,006

通過問題調試不好受小時後的東西,最後就開始讀數據當我從文件名中刪除「,」。