2017-02-24 50 views
2

我從DCOS羣集運行簡單的Spark應用程序。我想將輸出保存到羣集外的HDFS,但共享相同的域名。在安裝spark時,我在DCOS上做了一些最小配置 - 在高級配置中提到了hdfs-site.xml和core-site.xml路徑。 服務已啓動,當我運行火花作業時,出現未知的主機異常。我需要做其他配置嗎?我的防火牆被禁用。使用外部HDFS在DCOS上運行Spark

Registered docker executor on 
Starting task driver-20170224072457-0002 
Traceback (most recent call last): 
    File "/mnt/mesos/sandbox/test1.py", line 11, in <module> 
    out.saveAsTextFile("hdfs:///root/test/out.txt") 
    File "/opt/spark/dist/python/lib/pyspark.zip/pyspark/rdd.py", line 1519, in saveAsTextFile 
    File "/opt/spark/dist/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in __call__ 
    File "/opt/spark/dist/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling o49.saveAsTextFile. 
: java.lang.IllegalArgumentException: java.net.UnknownHostException: host-xyz.domain.com 
    at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) 
    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310) 
    at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) 
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:668) 
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:604) 
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148) 
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596) 
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) 
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) 
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) 
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) 
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:361) 
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) 
    at org.apache.spark.SparkHadoopWriter$.createPathFromString(SparkHadoopWriter.scala:175) 
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1063) 
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1030) 
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1030) 
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) 
    at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1030) 
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:956) 
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:956) 
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:956) 
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) 
    at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:955) 
    at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1459) 
    at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1438) 
    at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1438) 
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) 
    at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1438) 
    at org.apache.spark.api.java.JavaRDDLike$class.saveAsTextFile(JavaRDDLike.scala:549) 
    at org.apache.spark.api.java.AbstractJavaRDDLike.saveAsTextFile(JavaRDDLike.scala:45) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) 
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
    at py4j.Gateway.invoke(Gateway.java:280) 
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 
    at py4j.commands.CallCommand.execute(CallCommand.java:79) 
    at py4j.GatewayConnection.run(GatewayConnection.java:214) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.net.UnknownHostException: host-xyz.domain.com 
    ... 48 more 

回答

0

我能解決這個問題。

問題在於羣集中的計算機無法識別安裝HDFS的其他計算機。

在羣集的所有機器中編輯/etc/hosts文件並添加HDFS機器詳細信息。如果您需要兩者互相交流,請與HDFS機器一樣。如果我的解釋看起來不太正確,請糾正我。

相關問題