2015-09-05 106 views
7

我正在使用Spark在Scala中編寫應用程序。我正在使用Maven打包應用程序,並在構建"uber" or "fat" jar時遇到問題。使用Maven打包和運行Scala Spark項目

我現在面臨的問題是,在運行應用程序的工作的IDE的內部罰款,或者如果我提供的依賴作爲Java類路徑的非尤伯杯jar'd版本,但如果我給它不工作超級罐作爲班級路徑,即

java -Xmx2G -cp target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar debug.spark_example.Example data.txt 

不起作用。我收到以下錯誤信息:

ERROR SparkContext: Error initializing SparkContext. 
com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version' 
    at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124) 
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145) 
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151) 
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159) 
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164) 
    at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206) 
    at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:168) 
    at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:504) 
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:141) 
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:118) 
    at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122) 
    at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54) 
    at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) 
    at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991) 
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) 
    at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982) 
    at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56) 
    at org.apache.spark.rpc.akka.AkkaRpcEnvFactory.create(AkkaRpcEnv.scala:245) 
    at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:52) 
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:247) 
    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188) 
    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:424) 
    at debug.spark_example.Example$.main(Example.scala:9) 
    at debug.spark_example.Example.main(Example.scala) 

我真的很感激幫助理解什麼是我需要添加到pom.xml文件,爲什麼我需要添加它來得到這個工作。

我在網上搜索,發現了以下資源,這是我試過(見POM),但無法得到工作:

1)星火用戶郵件列表:http://apache-spark-user-list.1001560.n3.nabble.com/Packaging-a-spark-job-using-maven-td5615.html

2)how to package spark scala application

我有一個簡單的例子,說明了這個問題,一個簡單的1類項目中(src /主/斯卡拉/調試/ spark_example/Example.scala):

package debug.spark_example 

import org.apache.spark.{SparkConf, SparkContext} 

object Example { 
    def main(args: Array[String]): Unit = { 
    val sc = new SparkContext(new SparkConf().setAppName("Test").setMaster("local[2]")) 
    val lines = sc.textFile(args(0)) 
    val lineLengths = lines.map(s => s.length) 
    val totalLength = lineLengths.reduce((a, b) => a + b) 
    lineLengths.foreach(println) 
    println(totalLength) 
    } 
} 

下面是pom.xml文件:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> 
    <modelVersion>4.0.0</modelVersion> 
    <groupId>debug.spark-example</groupId> 
    <artifactId>spark-example</artifactId> 
    <version>0.1-SNAPSHOT</version> 
    <inceptionYear>2015</inceptionYear> 
    <properties> 
    <scala.majorVersion>2.11</scala.majorVersion> 
    <scala.minorVersion>.2</scala.minorVersion> 
    <spark.version>1.4.1</spark.version> 
    </properties> 


    <repositories> 
    <repository> 
     <id>scala-tools.org</id> 
     <name>Scala-Tools Maven2 Repository</name> 
     <url>http://scala-tools.org/repo-releases</url> 
    </repository> 
    </repositories> 

    <pluginRepositories> 
    <pluginRepository> 
     <id>scala-tools.org</id> 
     <name>Scala-Tools Maven2 Repository</name> 
     <url>http://scala-tools.org/repo-releases</url> 
    </pluginRepository> 
    </pluginRepositories> 

    <dependencies> 
    <dependency> 
     <groupId>org.scala-lang</groupId> 
     <artifactId>scala-library</artifactId> 
     <version>${scala.majorVersion}${scala.minorVersion}</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-core_${scala.majorVersion}</artifactId> 
     <version>${spark.version}</version> 
    </dependency> 
    </dependencies> 


    <build> 
     <sourceDirectory>src/main/scala</sourceDirectory> 
     <plugins> 
     <plugin> 
      <groupId>org.scala-tools</groupId> 
      <artifactId>maven-scala-plugin</artifactId> 
      <executions> 
      <execution> 
       <goals> 
       <goal>compile</goal> 
       <goal>testCompile</goal> 
       </goals> 
      </execution> 
      </executions> 
     </plugin> 
     <plugin> 
      <groupId>org.apache.maven.plugins</groupId> 
     <artifactId>maven-eclipse-plugin</artifactId> 
     <configuration> 
     <downloadSources>true</downloadSources> 
     <buildcommands> 
      <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand> 
     </buildcommands> 
     <additionalProjectnatures> 
      <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature> 
     </additionalProjectnatures> 
     <classpathContainers> 
      <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer> 
      <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer> 
     </classpathContainers> 
     </configuration> 
    </plugin> 
    <plugin> 
     <artifactId>maven-assembly-plugin</artifactId> 
     <version>2.4</version> 
     <executions> 
     <execution> 
      <id>make-assembly</id> 
      <phase>package</phase> 
      <goals> 
      <goal>attached</goal> 
      </goals> 
     </execution> 
     </executions> 
     <configuration> 
     <tarLongFileMode>gnu</tarLongFileMode> 
     <descriptorRefs> 
      <descriptorRef>jar-with-dependencies</descriptorRef> 
     </descriptorRefs> 
     </configuration> 
    </plugin> 
    <plugin> 
     <groupId>org.apache.maven.plugins</groupId> 
     <artifactId>maven-shade-plugin</artifactId> 
     <version>2.2</version> 
     <executions> 
     <execution> 
      <phase>package</phase> 
      <goals> 
      <goal>shade</goal> 
      </goals> 
      <configuration> 
      <minimizeJar>false</minimizeJar> 
      <createDependencyReducedPom>false</createDependencyReducedPom> 
      <artifactSet> 
       <includes> 
       <!-- Include here the dependencies you want to be packed in your fat jar --> 
       <include>*:*</include> 
       </includes> 
      </artifactSet> 
      <filters> 
       <filter> 
       <artifact>*:*</artifact> 
       <excludes> 
        <exclude>META-INF/*.SF</exclude> 
        <exclude>META-INF/*.DSA</exclude> 
        <exclude>META-INF/*.RSA</exclude> 
       </excludes> 
       </filter> 
      </filters> 
      <transformers> 
       <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> 
       <resource>reference.conf</resource> 
       </transformer> 
      </transformers> 
      </configuration> 
     </execution> 
     </executions> 
    </plugin> 
    <plugin> 
     <groupId>org.apache.maven.plugins</groupId> 
     <artifactId>maven-surefire-plugin</artifactId> 
     <version>2.7</version> 
     <configuration> 
     <skipTests>true</skipTests> 
     </configuration> 
    </plugin> 
    </plugins> 
</build> 
<reporting> 
    <plugins> 
    <plugin> 
     <groupId>org.scala-tools</groupId> 
     <artifactId>maven-scala-plugin</artifactId> 
    </plugin> 
    </plugins> 
</reporting> 
</project> 

在您的幫助非常感謝。

+0

您能詳細說明不起作用嗎? – Holden

+0

@Holden我添加了錯誤信息,我正在問這個問題。感謝您看這個! – br19

+0

你看過Akka關於陰影的說明:http://doc.akka.io/docs/akka/snapshot/general/configuration.html。 – Edmon

回答

0

這可能與你的maven插件的順序有關。您在項目中同時使用「maven-assembly-plugin」和「maven-shade-plugin」插件,它們都綁定到maven生命週期中的相同階段。發生這種情況時,maven按照它們出現在插件部分的順序執行插件,所以在你的情況下,它會執行插件插件,然後是插件插件。

根據您嘗試運行的輸出jar以及您擁有的色調變換,您可能需要相反的順序。但是,您可能甚至不需要用於您的用例的程序集插件。您可能可以使用target/spark-example-0.1-SNAPSHOT-shaded.jar文件。

<plugins> 
    <plugin> 
    <groupId>org.apache.maven.plugins</groupId> 
    <artifactId>maven-shade-plugin</artifactId> 
    <!-- SNIP --> 
    </plugin> 
    <plugin> 
    <artifactId>maven-assembly-plugin</artifactId> 
    <!-- SNIP --> 
    </plugin> 
</plugins> 
+0

謝謝你的回答傑夫。我仍然無法正常工作。我嘗試顛倒插件的順序並使用陰影罐。反轉訂單並沒有改變超級罐子。當使用陰影的jar時,錯誤信息是:'akka.ConfigurationException:Type [akka.dispatch.BoundedControlAwareMessageQueueSemantics]指定爲akka.actor.mailbox.requirement [akka.actor.mailbox.bounded-control-aware-queue-based]在配置無法加載由於[akka.dispatch.BoundedControlAwareMessageQueueSemantics] ' – br19

3

看來,星火提交腳本必須被用來運行程序。

不是:

java -Xmx2G -cp target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar debug.spark_example.Example data.txt 

做這樣的事情:

<path-to>/spark-1.4.1/bin/spark-submit --class debug.spark_example.Example --master local[2] target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar data.txt 

這也似乎工作陰影罐子;只有jar-with-dependencies。下面pom.xml文件爲我工作:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> 
    <modelVersion>4.0.0</modelVersion> 
    <groupId>debug.spark-example</groupId> 
    <artifactId>spark-example</artifactId> 
    <version>0.1-SNAPSHOT</version> 
    <inceptionYear>2015</inceptionYear> 
    <properties> 
     <scala.majorVersion>2.11</scala.majorVersion> 
     <scala.minorVersion>.2</scala.minorVersion> 
     <spark.version>1.4.1</spark.version> 
    </properties> 
    <repositories> 
     <repository> 
      <id>scala-tools.org</id> 
      <name>Scala-Tools Maven2 Repository</name> 
      <url>http://scala-tools.org/repo-releases</url> 
     </repository> 
    </repositories> 
    <pluginRepositories> 
     <pluginRepository> 
      <id>scala-tools.org</id> 
      <name>Scala-Tools Maven2 Repository</name> 
      <url>http://scala-tools.org/repo-releases</url> 
     </pluginRepository> 
    </pluginRepositories> 
    <dependencies> 
     <dependency> 
      <groupId>org.scala-lang</groupId> 
      <artifactId>scala-library</artifactId> 
      <version>${scala.majorVersion}${scala.minorVersion}</version> 
     </dependency> 
     <dependency> 
      <groupId>org.apache.spark</groupId> 
      <artifactId>spark-core_${scala.majorVersion}</artifactId> 
      <version>${spark.version}</version> 
     </dependency> 
    </dependencies> 
    <build> 
     <sourceDirectory>src/main/scala</sourceDirectory> 
     <plugins> 
      <plugin> 
       <groupId>org.scala-tools</groupId> 
       <artifactId>maven-scala-plugin</artifactId> 
       <executions> 
        <execution> 
         <goals> 
          <goal>compile</goal> 
          <goal>testCompile</goal> 
         </goals> 
        </execution> 
       </executions> 
      </plugin> 
      <plugin> 
       <groupId>org.apache.maven.plugins</groupId> 
       <artifactId>maven-eclipse-plugin</artifactId> 
       <configuration> 
        <downloadSources>true</downloadSources> 
        <buildcommands> 
         <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand> 
        </buildcommands> 
        <additionalProjectnatures> 
         <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature> 
        </additionalProjectnatures> 
       <classpathContainers> 
        <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer> 
        <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer> 
       </classpathContainers> 
       </configuration> 
      </plugin> 
      <plugin> 
       <artifactId>maven-assembly-plugin</artifactId> 
       <version>2.4</version> 
       <executions> 
       <execution> 
        <id>make-assembly</id> 
        <phase>package</phase> 
        <goals> 
         <goal>attached</goal> 
        </goals> 
       </execution> 
      </executions> 
      <configuration> 
       <tarLongFileMode>gnu</tarLongFileMode> 
       <descriptorRefs> 
        <descriptorRef>jar-with-dependencies</descriptorRef> 
       </descriptorRefs> 
      </configuration> 
     </plugin> 
     <plugin> 
      <groupId>org.apache.maven.plugins</groupId> 
      <artifactId>maven-surefire-plugin</artifactId> 
      <version>2.7</version> 
      <configuration> 
       <skipTests>true</skipTests> 
      </configuration> 
     </plugin> 
    </plugins> 
</build> 
<reporting> 
    <plugins> 
     <plugin> 
      <groupId>org.scala-tools</groupId> 
      <artifactId>maven-scala-plugin</artifactId> 
     </plugin> 
    </plugins> 
</reporting> 
</project> 
0

Akka Docs幫我解決這個問題。如果您使用Shade,則必須指定一個變壓器

      <transformer 
            implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> 
           <resource>reference.conf</resource> 
          </transformer> 
          <transformer 
            implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> 
           <manifestEntries> 
            <Main-Class>akka.Main</Main-Class> 
           </manifestEntries> 
          </transformer> 
+0

非常感謝!我會試試這個 – br19