2016-08-29 73 views
1

我提交的Apache星火應用程序編程YARN:冒號阿帕奇星火應用程序路徑

package application.RestApplication; 

import org.apache.hadoop.conf.Configuration; 
import org.apache.spark.SparkConf; 
import org.apache.spark.deploy.yarn.Client; 
import org.apache.spark.deploy.yarn.ClientArguments; 

public class App { 
    public static void main(String[] args1) { 
     String[] args = new String[] { 
       "--class", "org.apache.spark.examples.JavaWordCount", 
       "--jar", "/opt/spark/examples/jars/spark-examples_2.11-2.0.0.jar", 
       "--arg", "hdfs://hadoop-master:9000/input/file.txt" 
     }; 
     Configuration config = new Configuration(); 
     System.setProperty("SPARK_YARN_MODE", "true"); 
     SparkConf sparkConf = new SparkConf(); 
     ClientArguments cArgs = new ClientArguments(args); 
     Client client = new Client(cArgs, config, sparkConf); 
     client.run(); 
    } 
} 

我有線路問題:"--arg", "hdfs://hadoop-master:9000/input/file.txt" - 更具體的冒號:

16/08/29 09:54:16 ERROR yarn.ApplicationMaster: Uncaught exception: 
java.lang.NumberFormatException: For input string: "9000/input/plik2.txt" 
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
    at java.lang.Integer.parseInt(Integer.java:580) 
    at java.lang.Integer.parseInt(Integer.java:615) 
    at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272) 
    at scala.collection.immutable.StringOps.toInt(StringOps.scala:29) 
    at org.apache.spark.util.Utils$.parseHostPort(Utils.scala:935) 
    at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:547) 
    at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:405) 
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:247) 
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:749) 
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:71) 
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) 
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70) 
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:747) 
    at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:774) 
    at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) 

怎麼寫(作爲參數)與冒號文件的路徑?我嘗試各種組合,帶斜線,反斜線,%3A等..

+0

您是否嘗試過刪除'hdfs:// hadoop-master:9000'。 Spark I的大多數安裝都使用默認HDFS。 – DJElbow

+0

不工作:http://wklej.org/id/2802613/ :( –

+0

因爲它看起來像你在命令行上運行這個,你可以嘗試用單引號包裝你的參數'「'hdfs:// hadoop- master:9000/input/file.txt''' – DJElbow

回答

0

我改程序:https://github.com/mahmoudparsian/data-algorithms-book/blob/master/src/main/java/org/dataalgorithms/chapB13/client/SubmitSparkPiToYARNFromJavaCode.java

import org.apache.spark.SparkConf; 
import org.apache.spark.deploy.yarn.Client; 
import org.apache.spark.deploy.yarn.ClientArguments; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.log4j.Logger; 

public class SubmitSparkAppToYARNFromJavaCode { 
    public static void main(String[] args) throws Exception { 
     run(); 
    } 
    static void run() throws Exception { 
     String sparkExamplesJar = "/opt/spark/examples/jars/spark-examples_2.11-2.0.0.jar"; 
     final String[] args = new String[]{ 
      "--jar", 
      sparkExamplesJar, 
      "--class", 
      "org.apache.spark.examples.JavaWordCount", 
      "--arg", 
      "hdfs://hadoop-master:9000/input/file.txt" 
     }; 
     Configuration config = ConfigurationManager.createConfiguration();  
     System.setProperty("SPARK_YARN_MODE", "true"); 
     SparkConf sparkConf = new SparkConf(); 
     sparkConf.setSparkHome(SPARK_HOME); 
     sparkConf.setMaster("yarn"); 
     sparkConf.setAppName("spark-yarn"); 
     sparkConf.set("master", "yarn"); 
     sparkConf.set("spark.submit.deployMode", "cluster"); 
     ClientArguments clientArguments = new ClientArguments(args); 
     Client client = new Client(clientArguments, config, sparkConf); 
     client.run(); 
    } 
} 

和現在的作品!

0

根據Utils#parseHostPort這是通話過程中被調用,星火似乎認爲作爲端口所有的背後是最後:文:

def parseHostPort(hostPort: String): (String, Int) = { 
    // Check cache first. 
    val cached = hostPortParseResults.get(hostPort) 
    if (cached != null) { 
     return cached 
    } 

    val indx: Int = hostPort.lastIndexOf(':') 
    // This is potentially broken - when dealing with ipv6 addresses for example, sigh ... 
    // but then hadoop does not support ipv6 right now. 
    // For now, we assume that if port exists, then it is valid - not check if it is an int > 0 
    if (-1 == indx) { 
     val retval = (hostPort, 0) 
     hostPortParseResults.put(hostPort, retval) 
     return retval 
    } 

    val retval = (hostPort.substring(0, indx).trim(), hostPort.substring(indx + 1).trim().toInt) 
    hostPortParseResults.putIfAbsent(hostPort, retval) 
    hostPortParseResults.get(hostPort) 
} 

因此,整個字符串9000/input/file.txt應該是一個單一的端口號。這表明你不應該從HDFS文件系統引用你的輸入文件。我猜想Apache Spark中更熟練的人會給你更好的建議。