我有一個簡單的spark工作,它用給定的輸入文件中的逗號替換空格。如何以編程方式在yarn-client模式下提交spark應用程序?
當本地提交此作業時(使用IDE並執行內置的jar),它會成功完成,並且當主設備設置爲「yarn-client」時,作業會掛起很長時間,並引發以下異常。
我們有一個用例,我們希望以編程方式提交工作,而不是通過構建jar並通過spark-submit提交它。
星火版本:1.6.1 的Hadoop版本:2.7.1
和我在我的POM所有的火花,紗和Hadoop的依賴。
工作失敗,原因是以下異常
java.net.ConnectException: Call From spark.node123.com/192.168.2.1 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor13.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1480)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy10.getClusterMetrics(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:152)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy11.getClusterMetrics(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:246)
at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:129)
at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:129)
at org.apache.spark.Logging$class.logInfo(Logging.scala:58)
at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:62)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:128)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
at tardis.platform.TardisContext$.apply(TardisContext.scala:20)
at tardis.common.plugins.Heartbeat.isAbleTocreateContext(Heartbeat.scala:45)
at tardis.common.plugins.Heartbeat.performAction(Heartbeat.scala:33)
at tardis.core.scheduler.jobs.PluginExecutorJob.execute(PluginExecutorJob.scala:40)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
at org.apache.hadoop.ipc.Client.call(Client.java:1446)
... 25 more
我的用例是以編程方式提交工作(即不使用spark-submit命令).. bdw,問題不在防火牆 – sandyyyy
@sandyyyy您可能需要檢查https://github.com/kakao/cuesheet – iboss