2015-04-02 59 views
0

背景:我有兩臺主機名相同的機器,我需要設置一個本地Spark集羣進行測試,設置主機和工作正常,但嘗試使用驅動程序運行應用程序導致問題,netty似乎沒有選擇正確的主機(不管我放在那裏,它只是選擇第一個主機)。將主機名傳遞給netty

相同的主機名:

$ dig +short corehost 
192.168.0.100 
192.168.0.101 

星火配置(使用由主機和本地工人):

export SPARK_LOCAL_DIRS=/some/dir 
export SPARK_LOCAL_IP=corehost  // i tried various like 192.168.0.x for 
export SPARK_MASTER_IP=corehost  // local, master and the driver 
export SPARK_MASTER_PORT=7077 
export SPARK_WORKER_CORES=2 
export SPARK_WORKER_MEMORY=2g 
export SPARK_WORKER_INSTANCES=2 
export SPARK_WORKER_DIR=/some/dir 

星火啓動,我可以在web-UI的工人。 當我運行火花「工作」的:

val conf = new SparkConf().setAppName("AaA") 
          // tried 192.168.0.x and localhost 
          .setMaster("spark://corehost:7077") 
val sc = new SparkContext(conf) 

我得到這個異常:

15/04/02 12:34:04 INFO SparkContext: Running Spark version 1.3.0 
15/04/02 12:34:04 WARN Utils: Your hostname, corehost resolves to a loopback address: 127.0.0.1; using 192.168.0.100 instead (on interface en1) 
15/04/02 12:34:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 
15/04/02 12:34:05 ERROR NettyTransport: failed to bind to corehost.home/192.168.0.101:0, shutting down Netty transport 
... 
Exception in thread "main" java.net.BindException: Failed to bind to: corehost.home/192.168.0.101:0: Service 'sparkDriver' failed after 16 retries! 
    at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) 
    at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393) 
    at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389) 
    at scala.util.Success$$anonfun$map$1.apply(Try.scala:206) 
    at scala.util.Try$.apply(Try.scala:161) 
    at scala.util.Success.map(Try.scala:206) 
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) 
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) 
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) 
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) 
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) 
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) 
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) 
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) 
    at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) 
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) 
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) 
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
15/04/02 12:34:05 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 
15/04/02 12:34:05 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 
15/04/02 12:34:05 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down. 

Process finished with exit code 1 

不知道如何處理......其IP地址的整個叢林。 不確定這是否是網絡問題。

+0

dig只查詢DNS服務;而hostname使用glibc的內部名稱解析路徑,該路徑也查看系統上存在的文件,例如'/ etc/hosts'。這個文件可能有netty正在抱怨的'corehost'的映射。你可以使用'getent host corehost'來檢查這個映射。 – Petesh 2015-04-02 12:03:19

+0

@Petesh這不是一個錯誤映射的情況,在同一個網絡上有兩個具有相同主機名的物理機器。問題是沒有Lyuben在回答中提到的額外設置,司機不確定要選擇哪個主機。謝謝你的想法!有用的調試工具。 – user2003470 2015-04-02 14:23:53

回答

2

我對同樣的問題的經驗是它圍繞本地設置事情。嘗試是在你的火花驅動程序代碼更詳細的SPARK_LOCAL_IP和驅動程序的主機IP添加到配置:

val conf = new SparkConf().setAppName("AaA") 
          .setMaster("spark://localhost:7077") 
          .set("spark.local.ip","192.168.1.100") 
          .set("spark.driver.host","192.168.1.100") 

這應該告訴網狀要使用的兩個相同的主機。

相關問題