0
我有所有Spark節點上可用在EMR集羣下列權限的本地文件:星火無法讀取本地文件
-rw-rw---- 1 test_user test_group 30 Jun 21 14:20 /tmp/foo_test
我運行集羣ec2-user
,使用紗調度。爲了使Spark/Yarn能夠訪問文件,我在所有節點上添加了test_group
作爲yarn
用戶的輔助組。
$ sudo -u yarn groups
yarn hadoop test_group
在火花外殼,我得到以下錯誤讀取文件:
scala> val rdd = sc.textFile("file:///tmp/foo_test")
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0:
java.io.FileNotFoundException: /tmp/foo_test (Permission denied)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:111)
at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:207)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:141)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:771)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
如何閱讀與EMR星火組級權限的文件嗎?
當文件的權限(跨越所有節點)設置爲770時,我得到同樣的錯誤。 – shj
請嘗試更改'tmp'目錄的權限也 - 查看更新的命令 –
它看起來像/ tmp已經有完整的rwx訪問用戶,組,其他 – shj