我在EC2上運行獨立的Spark集羣,並且正在使用Spark-Cassandra連接器驅動程序編寫應用程序,並嘗試將作業提交到Spark集羣編程。 作業本身很簡單:無法以編程方式提交Spark應用程序(使用Cassandra連接器)從遠程客戶端集羣
public static void main(String[] args) {
SparkConf conf;
JavaSparkContext sc;
conf = new SparkConf()
.set("spark.cassandra.connection.host", host);
conf.set("spark.driver.host", "[my_public_ip]");
conf.set("spark.driver.port", "15000");
sc = new JavaSparkContext("spark://[spark_master_host]","test",conf);
CassandraJavaRDD<CassandraRow> rdd = javaFunctions(sc).cassandraTable(
"keyspace", "table");
System.out.println(rdd.first().toString());
sc.stop();
}
當我運行的是哪一個運行良好,我EC2集羣的星火主節點。 我正試圖在遠程Windows客戶端中運行此操作。 問題是從以下兩行:
conf.set("spark.driver.host", "[my_public_ip]");
conf.set("spark.driver.port", "15000");
首先,如果我註釋掉這兩條線,應用程序將不會拋出一個異常,但執行程序沒有運行,有以下日誌:
14/12/06 22:40:03 INFO client.AppClient$ClientActor: Executor updated: app-20141207033931-0021/3 is now LOADING
14/12/06 22:40:03 INFO client.AppClient$ClientActor: Executor updated: app-20141207033931-0021/0 is now EXITED (Command exited with code 1)
14/12/06 22:40:03 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141207033931-0021/0 removed: Command exited with code 1
這永遠不會結束,當我檢查工作節點的日誌,我發現:
14/12/06 22:40:21 ERROR security.UserGroupInformation: PriviledgedActionException as:[username] cause:java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
... 4 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:125)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
... 7 more
我不知道那是什麼約,我的猜測是,可能是工作節點無法連接到驅動程序,它可能最初設置爲:
14/12/06 22:39:30 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected][some_host_name]:52660]
14/12/06 22:39:30 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected][some_host_name]:52660]
顯然,沒有DNS是要解決我的主機名...
既然不能設置部署模式"client"
或"cluster"
,如果不通過./spark-submit
腳本(我認爲這很荒謬......)。我嘗試在所有Spark Master Worker節點的/etc/hosts
中添加主機分辨率"XX.XXX.XXX.XX [host-name]"
。
當然沒有運氣... 這導致我到第二,un-comment兩行;
這給了我:
14/12/06 22:59:41 INFO Remoting: Starting remoting
14/12/06 22:59:41 ERROR Remoting: Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
at akka.remote.Remoting.start(Remoting.scala:194)
...
原因:
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: /[my_public_ip]:15000
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:391)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:388)
我雙重檢查我的防火牆設置和路由器設置,確認我的防火牆diabled;和netstat -an
確認端口15000沒有被使用(實際上我試圖改變到幾個可用的端口,沒有運氣);和我ping
我的公共ip從其他機器和機器從我的集羣,沒問題。
現在我完全搞砸了,我只是想盡辦法解決這個問題。有什麼建議麼?任何幫助表示讚賞!