我想通過oozie工作流在Hadoop集羣上執行YARN上的基本火花操作,並且出現以下錯誤(從YARN應用程序日誌):提交oozie失敗的Pyspark操作:'[Errno 2]沒有這樣的文件或目錄'
>>> Invoking Spark class now >>>
python: can't open file '/absolute/local/path/to/script.py': [Errno 2] No such file or directory
Hadoop Job IDs executed by Spark:
Intercepting System.exit(2)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [2]
但我確定該文件在那裏。實際上,當我運行以下命令:
spark-submit --master yarn --deploy-mode client /absolute/local/path/to/script.py arg1 arg2
它的工作原理。我得到我想要的輸出。
注:我按照本文中的一切得到它設置(我使用Spark2): https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-component-guide/content/ch_oozie-spark-action.html
任何想法?
workflow.xml(爲了清楚而簡化)
<action name = "action1">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>${sparkMaster}</master>
<mode>${sparkMode}</mode>
<name>action1</name>
<jar>${integrate_script}</jar>
<arg>arg1</arg>
<arg>arg2</arg>
</spark>
<ok to = "end" />
<error to = "kill_job" />
</action>
job.properties(爲了清楚而簡化)在CLUSTER模式下運行時
oozie.wf.application.path=${nameNode}/user/${user.name}/${user.name}/${zone}
oozie.use.system.libpath=true
nameNode=hdfs://myNameNode:8020
jobTracker=myJobTracker:8050
oozie.action.sharelib.for.spark=spark2
sparkMaster=yarn
sparkMode=client
integrate_script=/absolute/local/path/to/script.py
zone=somethingUsefulForMe
例外:
diagnostics: Application application_1502381591395_1000 failed 2 times due to AM Container for appattempt_1502381591395_1000_000002 exited with exitCode: -1000
For more detailed output, check the application tracking page: http://hostname:port/cluster/app/application_1502381591395_1000 Then click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://hostname:port/user/oozie/.sparkStaging/application_1502381591395_1000/__spark_conf__.zip
java.io.FileNotFoundException: File does not exist: hdfs://hostname:port/user/oozie/.sparkStaging/application_1502381591395_1000/__spark_conf__.zip
at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1427)
at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1419)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
編輯2:
我剛剛從外殼嘗試,它由於導入失敗。
/scripts/functions/tools.py
/scripts/functions/__init__.py
/scripts/myScript.py
from functions.tools import *
而這就是失敗的原因。我假設腳本首先被複制到羣集並在那裏運行。我如何獲得所有必需的模塊也可以使用它?修改hdfs上的PYTHONPATH?我明白爲什麼它不工作,只是不知道如何解決它。
EDIT3:
請參見下面的堆棧跟蹤。大多數在線評論都表示,問題在於python代碼將Master設置爲「local」。不是這種情況。更重要的是,我甚至刪除了所有與spark相關的內容(在python腳本中),並且仍然遇到同樣的問題。
Diagnostics: File does not exist: hdfs://hdfs/path/user/myUser/.sparkStaging/application_1502381591395_1783/pyspark.zip
java.io.FileNotFoundException: File does not exist: hdfs://hdfs/path/user/myUser/.sparkStaging/application_1502381591395_1783/pyspark.zip
at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1427)
at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1419)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
是'/絕對/路徑/到/ script.py'本地文件系統路徑或HDFS路徑? – Mariusz
好點。它是本地的。最初我嘗試了一個HDFS路徑,並得到一個非常明確的錯誤,該腳本必須是本地的。編輯,以避免混淆 – Tiberiu