2016-12-13 58 views
9

我對Spark的知識是有限的,讀完這個問題後你會感覺到它。我只有一個節點和火花,hadoop和紗線安裝在它上面。屬性spark.yarn.jars - 如何處理它?

我能夠代碼和下面的命令

spark-submit --class com.sanjeevd.sparksimple.wordcount.JobRunner 
       --master yarn 
       --deploy-mode cluster 
       --driver-memory=2g 
       --executor-memory 2g 
       --executor-cores 1 
       --num-executors 1 
       SparkSimple-0.0.1SNAPSHOT.jar         
       hdfs://sanjeevd.br:9000/user/spark-test/word-count/input 
       hdfs://sanjeevd.br:9000/user/spark-test/word-count/output 

只是正常運行羣集模式字計數問題。

現在我明白了'spark on yarn'需要集羣上可用的spark jar文件,如果我什麼都不做,那麼每次運行我的程序時都會將數百個jar文件從$ SPARK_HOME複製到每個節點在我的情況下,它只是一個節點)。我看到代碼的執行在完成複製之前暫停了一段時間。請參閱以下內容 -

16/12/12 17:24:03 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 
16/12/12 17:24:06 INFO yarn.Client: Uploading resource file:/tmp/spark-a6cc0d6e-45f9-4712-8bac-fb363d6992f2/__spark_libs__11112433502351931.zip -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0001/__spark_libs__11112433502351931.zip 
16/12/12 17:24:08 INFO yarn.Client: Uploading resource file:/home/sanjeevd/personal/Spark-Simple/target/SparkSimple-0.0.1-SNAPSHOT.jar -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0001/SparkSimple-0.0.1-SNAPSHOT.jar 
16/12/12 17:24:08 INFO yarn.Client: Uploading resource file:/tmp/spark-a6cc0d6e-45f9-4712-8bac-fb363d6992f2/__spark_conf__6716604236006329155.zip -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0001/__spark_conf__.zip 

Spark的文檔建議設置spark.yarn.jars屬性以避免此複製。所以我在spark-defaults.conf文件中設置了以下屬性。

spark.yarn.jars hdfs://sanjeevd.br:9000//user/spark/share/lib 

http://spark.apache.org/docs/latest/running-on-yarn.html#preparations 爲了使星火運行罐子從紗線側進入,您可以指定spark.yarn.archive或spark.yarn.jars。有關詳細信息,請參閱Spark屬性。如果既沒有指定spark.yarn.archive也沒有指定spark.yarn.jars,則Spark將創建一個包含$ SPARK_HOME/jars下所有jar的zip文件並將其上傳到分佈式緩存。

順便說一句,我有所有從本地/opt/spark/jars jar文件到HDFS /user/spark/share/lib。他們的人數是206人。

這使我的罐子失敗。以下是錯誤 -

spark-submit --class com.sanjeevd.sparksimple.wordcount.JobRunner --master yarn --deploy-mode cluster --driver-memory=2g --executor-memory 2g --executor-cores 1 --num-executors 1 SparkSimple-0.0.1-SNAPSHOT.jar hdfs://sanjeevd.br:9000/user/spark-test/word-count/input hdfs://sanjeevd.br:9000/user/spark-test/word-count/output 
16/12/12 17:43:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
16/12/12 17:43:07 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 
16/12/12 17:43:07 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers 
16/12/12 17:43:07 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (5120 MB per container) 
16/12/12 17:43:07 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead 
16/12/12 17:43:07 INFO yarn.Client: Setting up container launch context for our AM 
16/12/12 17:43:07 INFO yarn.Client: Setting up the launch environment for our AM container 
16/12/12 17:43:07 INFO yarn.Client: Preparing resources for our AM container 
16/12/12 17:43:07 INFO yarn.Client: Uploading resource file:/home/sanjeevd/personal/Spark-Simple/target/SparkSimple-0.0.1-SNAPSHOT.jar -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0005/SparkSimple-0.0.1-SNAPSHOT.jar 
16/12/12 17:43:07 INFO yarn.Client: Uploading resource file:/tmp/spark-fae6a5ad-65d9-4b64-9ba6-65da1310ae9f/__spark_conf__7881471844385719101.zip -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0005/__spark_conf__.zip 
16/12/12 17:43:08 INFO spark.SecurityManager: Changing view acls to: sanjeevd 
16/12/12 17:43:08 INFO spark.SecurityManager: Changing modify acls to: sanjeevd 
16/12/12 17:43:08 INFO spark.SecurityManager: Changing view acls groups to: 
16/12/12 17:43:08 INFO spark.SecurityManager: Changing modify acls groups to: 
16/12/12 17:43:08 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(sanjeevd); groups with view permissions: Set(); users with modify permissions: Set(sanjeevd); groups with modify permissions: Set() 
16/12/12 17:43:08 INFO yarn.Client: Submitting application application_1481592214176_0005 to ResourceManager 
16/12/12 17:43:08 INFO impl.YarnClientImpl: Submitted application application_1481592214176_0005 
16/12/12 17:43:09 INFO yarn.Client: Application report for application_1481592214176_0005 (state: ACCEPTED) 
16/12/12 17:43:09 INFO yarn.Client: 
client token: N/A 
diagnostics: N/A 
ApplicationMaster host: N/A 
ApplicationMaster RPC port: -1 
queue: default 
start time: 1481593388442 
final status: UNDEFINED 
tracking URL: http://sanjeevd.br:8088/proxy/application_1481592214176_0005/ 
user: sanjeevd 
16/12/12 17:43:10 INFO yarn.Client: Application report for application_1481592214176_0005 (state: FAILED) 
16/12/12 17:43:10 INFO yarn.Client: 
client token: N/A 
diagnostics: Application application_1481592214176_0005 failed 1 times due to AM Container for appattempt_1481592214176_0005_000001 exited with exitCode: 1 
For more detailed output, check application tracking page:http://sanjeevd.br:8088/cluster/app/application_1481592214176_0005Then, click on links to logs of each attempt. 
Diagnostics: Exception from container-launch. 
Container id: container_1481592214176_0005_01_000001 
Exit code: 1 
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) 
    at org.apache.hadoop.util.Shell.run(Shell.java:456) 
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) 
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) 
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 


Container exited with a non-zero exit code 1 
Failing this attempt. Failing the application. 
    ApplicationMaster host: N/A 
    ApplicationMaster RPC port: -1 
    queue: default 
    start time: 1481593388442 
    final status: FAILED 
    tracking URL: http://sanjeevd.br:8088/cluster/app/application_1481592214176_0005 
    user: sanjeevd 
16/12/12 17:43:10 INFO yarn.Client: Deleting staging directory hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0005 
Exception in thread "main" org.apache.spark.SparkException: Application application_1481592214176_0005 finished with failed status 
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1132) 
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1175) 
    at org.apache.spark.deploy.yarn.Client.main(Client.scala) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:497) 
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736) 
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185) 
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
16/12/12 17:43:10 INFO util.ShutdownHookManager: Shutdown hook called 
16/12/12 17:43:10 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-fae6a5ad-65d9-4b64-9ba6-65da1310ae9f 

你知道我做錯了嗎?該任務的日誌說下面 -

Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster 

我理解錯誤ApplicationMaster類沒有找到,但我的問題是,爲什麼沒有發現 - 在這個類應該是什麼?我沒有裝配jar,因爲我正在使用spark 2.0.1,其中沒有捆綁裝配。

這與spark.yarn.jars屬性有什麼關係?這個屬性是爲了幫助在紗線上運行,應該是這樣。使用spark.yarn.jars時需要額外做些什麼?

感謝您閱讀本問題並提前給予幫助。

+0

Sanjeev嗨,在我的情況下,只有在$ SPAKR_HOME $ /罐罐子得到複製。你如何使自己的jar,即'SparkSimple-0.0.1SNAPSHOT.jar'也複製到hdfs? – ascetic652

回答

9

我終於能夠理解這個屬性。我發現打正試驗,這個屬性的正確語法是

spark.yarn.jars = HDFS:// XX:9000 /用戶/火花/股/ lib目錄/ *罐子

我沒有把*.jar放到最後,我的路徑剛剛以/ lib結尾。我試圖把這樣的實際裝配罐 - spark.yarn.jars=hdfs://sanjeevd.brickred:9000/user/spark/share/lib/spark-yarn_2.11-2.0.1.jar,但沒有運氣。所有它說,無法加載ApplicationMaster。

我貼我的迴應類似的問題在https://stackoverflow.com/a/41179608/2332121

4

如果你看看spark.yarn.jars文檔,它說含有星火代碼分發到YARN容器庫以下

名單。默認情況下,YARN上的Spark將使用本地安裝的Spark jar,但Spark jar也可以位於HDFS上全球可讀的位置。這允許YARN將它緩存在節點上,以便每次應用程序運行時不需要分發它。例如,要指向HDFS上的jar,將此配置設置爲hdfs:/// some/path。 Globs是允許的。

這意味着你實際上是重寫SPARK_HOME /罐,並告訴紗拿起所有從您的路徑的應用程序運行所需要的罐子,如果設置spark.yarn.jars屬性,所有依賴罐子如果你去看SPARK_HOME/lib目錄下的spark-assembly.jar,org.apache.spark.deploy.yarn.ApplicationMaster類是存在的,所以確保所有的spark依賴項都存在於你指定的HDFS路徑中spark.yarn.jars。

+1

謝謝!我最後修改了我的問題。由於我使用Spark 2.0.1,其中沒有集合jar捆綁。所以我找不到ApplicationMaster java類。當我取消設置spark.yarn.jars屬性時,爲什麼spark不會投訴?當我將所有/ spark/jar上傳到HDFS並將spark.yarn.jars屬性指向這個HDFS位置時,Spark變得瘋狂並要求ApplicationMaster。順便說一句,我沒有/ spark/lib文件夾。我想他們在2.x版本中也改變了它。任何幫助請。 –

+0

從Spark 2.X開始,他們已經停止創建裝配jar,如果你在/ jars文件夾中看到spark-yarn_ - .jar,它應該包含ApplicationMaster類,驗證你是否有這個jar在你的/ jar文件夾。如果你有它,並且你已經將它複製到HDFS位置,那麼我不知道你爲什麼得到這個錯誤。 :) –

+0

感謝您的幫助。我提高了你的評論;看起來像我有指定此屬性的語法問題。 –

9

有人問你也可以使用spark.yarn.archive選項並將其設置爲包含在$SPARK_HOME/jars/文件夾中的所有JAR文件的歸檔(創建)的位置,在檔案的根目錄。例如:

  1. 創建存檔:jar cv0f spark-libs.jar -C $SPARK_HOME/jars/ .
  2. 上傳到HDFS:hdfs dfs -put spark-libs.jar /some/path/
  3. 設置spark.yarn.archivehdfs:///some/path/spark-libs.jar