2014-12-07 63 views
3

我在EC2上運行獨立的Spark集羣,並且正在使用Spark-Cassandra連接器驅動程序編寫應用程序,並嘗試將作業提交到Spark集羣編程。 作業本身很簡單:無法以編程方式提交Spark應用程序(使用Cassandra連接器)從遠程客戶端集羣

public static void main(String[] args) { 
    SparkConf conf; 
    JavaSparkContext sc; 
    conf = new SparkConf() 
      .set("spark.cassandra.connection.host", host); 
    conf.set("spark.driver.host", "[my_public_ip]"); 
    conf.set("spark.driver.port", "15000"); 
    sc = new JavaSparkContext("spark://[spark_master_host]","test",conf); 
    CassandraJavaRDD<CassandraRow> rdd = javaFunctions(sc).cassandraTable(
      "keyspace", "table"); 
    System.out.println(rdd.first().toString()); 
    sc.stop(); 
} 

當我運行的是哪一個運行良好,我EC2集羣的星火主節點。 我正試圖在遠程Windows客戶端中運行此操作。 問題是從以下兩行:

conf.set("spark.driver.host", "[my_public_ip]"); 
    conf.set("spark.driver.port", "15000"); 

首先,如果我註釋掉這兩條線,應用程序將不會拋出一個異常,但執行程序沒有運行,有以下日誌:

14/12/06 22:40:03 INFO client.AppClient$ClientActor: Executor updated: app-20141207033931-0021/3 is now LOADING 

14/12/06 22:40:03 INFO client.AppClient$ClientActor: Executor updated: app-20141207033931-0021/0 is now EXITED (Command exited with code 1) 

14/12/06 22:40:03 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141207033931-0021/0 removed: Command exited with code 1 

這永遠不會結束,當我檢查工作節點的日誌,我發現:

14/12/06 22:40:21 ERROR security.UserGroupInformation: PriviledgedActionException as:[username] cause:java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] 
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs  
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134) 
     at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52) 
     at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113) 
     at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156) 
     at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) 
Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:415) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) 
... 4 more 
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] 
     at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) 
     at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) 
     at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) 
     at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) 
     at scala.concurrent.Await$.result(package.scala:107) 
     at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:125) 
     at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53) 
     at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52) 
     ... 7 more 

我不知道那是什麼約,我的猜測是,可能是工作節點無法連接到驅動程序,它可能最初設置爲:

14/12/06 22:39:30 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected][some_host_name]:52660] 
14/12/06 22:39:30 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected][some_host_name]:52660] 

顯然,沒有DNS是要解決我的主機名...

既然不能設置部署模式"client""cluster",如果不通過./spark-submit腳本(我認爲這很荒謬......)。我嘗試在所有Spark Master Worker節點的/etc/hosts中添加主機分辨率"XX.XXX.XXX.XX [host-name]"

當然沒有運氣... 這導致我到第二,un-comment兩行;

這給了我:

14/12/06 22:59:41 INFO Remoting: Starting remoting 
14/12/06 22:59:41 ERROR Remoting: Remoting error: [Startup failed] [ 
akka.remote.RemoteTransportException: Startup failed 
     at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129) 
     at akka.remote.Remoting.start(Remoting.scala:194) 
     ... 

原因:

Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: /[my_public_ip]:15000 
     at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) 
     at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:391) 
     at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:388) 

我雙重檢查我的防火牆設置和路由器設置,確認我的防火牆diabled;和netstat -an確認端口15000沒有被使用(實際上我試圖改變到幾個可用的端口,沒有運氣);和我ping我的公共ip從其他機器和機器從我的集羣,沒問題。

現在我完全搞砸了,我只是想盡辦法解決這個問題。有什麼建議麼?任何幫助表示讚賞!

回答

-2

請檢查15000是否在您的安全組中。

相關問題