我有一個火花應用程序,它在集羣AWS EMR運行。星火SQL不看HDFS文件
我添加文件HDFS:HDFS上
javaSparkContext.addFile(filePath, recursive);
文件存在(日誌可用:文件是可讀/ executeble /寫),但使用火花SQL API,我不能從該文件讀取信息:
LOGGER.info("Spark working directory: " + path);
File file = new File(path + "/test.avro");
LOGGER.info("SPARK PATH:" + file);
LOGGER.info("read:" + file.canRead());
LOGGER.info("execute:" + file.canExecute());
LOGGER.info("write:" + file.canWrite());
Dataset<Row> load = getSparkSession()
.read()
.format(AVRO_DATA_BRICKS_LIBRARY)
.load(file.getAbsolutePath());
有日誌:
17/08/07 15:03:25 INFO SparkContext: Added file /mnt/yarn/usercache/hadoop/appcache/application_1502118042722_0001/container_1502118042722_0001_01_000001/test.avro at spark://HOST:PORT/files/test.avro with timestamp 1502118205059
17/08/07 15:03:25 INFO Utils: Copying /mnt/yarn/usercache/hadoop/appcache/application_1502118042722_0001/container_1502118042722_0001_01_000001/test.avro to /mnt/yarn/usercache/hadoop/appcache/application_1502118042722_0001/spark-d5b494fc-2613-426f-80fc-ca66279c2194/userFiles-44aad2e8-04f4-420b-9b5e-a1ccde5db9ec/test.avro
17/08/07 15:03:25 INFO AbstractS3Calculator: Spark working directory: /mnt/yarn/usercache/hadoop/appcache/application_1502118042722_0001/spark-d5b494fc-2613-426f-80fc-ca66279c2194/userFiles-44aad2e8-04f4-420b-9b5e-a1ccde5db9ec
17/08/07 15:03:25 INFO AbstractS3Calculator: SPARK PATH:/mnt/yarn/usercache/hadoop/appcache/application_1502118042722_0001/spark-d5b494fc-2613-426f-80fc-ca66279c2194/userFiles-44aad2e8-04f4-420b-9b5e-a1ccde5db9ec/test.avro
17/08/07 15:03:25 INFO AbstractS3Calculator: read:true
17/08/07 15:03:25 INFO AbstractS3Calculator: execute:true
17/08/07 15:03:25 INFO AbstractS3Calculator: write:true
org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://HOST:PORT/mnt/yarn/usercache/hadoop/appcache/application_1502118042722_0001/spark-d5b494fc-2613-426f-80fc-ca66279c2194/userFiles-44aad2e8-04f4-420b-9b5e-a1ccde5db9ec/test.avro;
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at odh.spark.services.algorithms.calculators.RiskEngineS3Calculator.getInputMembers(RiskEngineS3Calculator.java:76)
at odh.spark.services.algorithms.calculators.RiskEngineS3Calculator.getMembersDataSets(RiskEngineS3Calculator.java:124)
at odh.spark.services.algorithms.calculators.AbstractS3Calculator.calculate(AbstractS3Calculator.java:50)
at odh.spark.services.ProgressSupport.start(ProgressSupport.java:47)
at odh.spark.services.Engine.startCalculations(Engine.java:102)
at odh.spark.services.Engine.startCalculations(Engine.java:135)
at odh.spark.SparkApplication.main(SparkApplication.java:19)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
你能顯示路徑的值嗎? –
Path是星火的日誌工作目錄 – yazabara
嘗試運行的應用程序的根目錄。 – Mahdi