2017-08-04 101 views
0

運行程序我進入這些命令上pyspark遇到無法在pyspark

In [1]: myrdd = sc.textFile("Cloudera-cdh5.repo") 
In [2]: myrdd.map(lambda x:x.upper()).collect() 

當我執行一個錯誤 'myrdd.map。(拉姆達X:x.upper())收集()',我遇到了一個錯誤

以下是錯誤信息

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. 
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, tiger): java.io.IOException: Cannot run program "/usr/local/bin/python3": error=2, No such file or directory 
     at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) 
     at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:160) 
     at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86) 
     at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62) 
     at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135) 
     at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:73) 
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) 
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
     at org.apache.spark.scheduler.Task.run(Task.scala:88) 
     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
     at java.lang.Thread.run(Thread.java:745) 
    Caused by: java.io.IOException: error=2, No such file or directory 
     at java.lang.UNIXProcess.forkAndExec(Native Method) 
     at java.lang.UNIXProcess.<init>(UNIXProcess.java:186) 
     at java.lang.ProcessImpl.start(ProcessImpl.java:130) 
     at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) 
     ... 13 more 

文件在/ usr/local/bin目錄/ python3在磁盤上存在

我該如何解決上述錯誤?

+0

檢查是否執行權限設置爲「在/ usr/local/bin目錄/ python3 「來自所有用戶。/usr/loca/bin/python3的 – shanmuga

+0

權限爲lrwxrwxrwx。它是/usr/loca/bin/python3.5的鏈接。 python3.5的權限是-rwxr-xr-x –

回答

0

您需要提供/usr/local/bin/python3此路徑的訪問權限,您可以使用命令sudo chmod 777 /usr/local/bin/python3/*

我認爲是由可變PYSPARK_PYTHON發生這個問題,它是利用指點Python的位置,每nodeyou可以使用下面的命令

export PYSPARK_PYTHON=/usr/local/bin/python3 
+0

我已經在〜/ .bashrc文件中設置了PYSPARK_PYTHON變量來指向/ usr/local/bin/python3 –

+0

然後你可以給'/ usr賦予'777'權限/ local/bin/python3/*'並嘗試。 – Sharma

+0

它是python3.5的鏈接文件,具有相同的目錄。權限是777.但它仍然不好我 –