2016-09-12 245 views
3

我組裝與Maven Assembly插件脂肪罐子,遇到以下問題時:「無法找到數據源:實木複合地板」讓脂肪罐子行家

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: parquet. Please find packages at http://spark-packages.org 
    at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:145) 
    at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:78) 
    at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:78) 
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:310) 
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149) 
    at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:427) 
    at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:411) 
    at org.apache.spark.mllib.classification.impl.GLMClassificationModel$SaveLoadV1_0$.loadData(GLMClassificationModel.scala:77) 
    at org.apache.spark.mllib.classification.LogisticRegressionModel$.load(LogisticRegression.scala:183) 
    at org.apache.spark.mllib.classification.LogisticRegressionModel.load(LogisticRegression.scala) 
    at my.test.spark.assembling.TopicClassifier.load(TopicClassifier.java:35) 
    at my.test.spark.assembling.Main.main(Main.java:23) 
Caused by: java.lang.ClassNotFoundException: parquet.DefaultSource 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:130) 
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:130) 
    at scala.util.Try$.apply(Try.scala:192) 
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:130) 
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:130) 
    at scala.util.Try.orElse(Try.scala:84) 
    at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:130) 
    ... 11 more 

這裏是pom.xml中:

<groupId>my.test.spark</groupId> 
<artifactId>assembling</artifactId> 
<version>1.0-SNAPSHOT</version> 

<properties> 
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> 
</properties> 


<dependencies> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-core_2.11</artifactId> 
     <version>2.0.0</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-mllib_2.11</artifactId> 
     <version>2.0.0</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-sql_2.11</artifactId> 
     <version>2.0.0</version> 
    </dependency> 
</dependencies> 

<build> 
    <plugins> 
     <plugin> 
      <groupId>org.apache.maven.plugins</groupId> 
      <artifactId>maven-compiler-plugin</artifactId> 
      <configuration> 
       <source>1.8</source> 
       <target>1.8</target> 
      </configuration> 
     </plugin> 

     <plugin> 
      <artifactId>maven-assembly-plugin</artifactId> 
      <configuration> 
       <descriptorRefs> 
        <descriptorRef>jar-with-dependencies</descriptorRef> 
       </descriptorRefs> 
      </configuration> 
     </plugin> 
    </plugins> 
</build> 

如果我在的IntelliJ IDEA不會出現問題運行它。

還有什麼我應該包括到jar來找到類?

+0

是jar文件中的類嗎? –

+0

不,它看起來像'parquet.DefaultSource'不在那裏。 但''org.apache.spark.sql.execution.datasources.parquet'包中的許多其他類都在jar中。 –

回答

4

我找到了解決問題的辦法。我試圖建立與sbt assembly包並經歷了不同但相關的問題。我在這裏找到的解決方案:https://stackoverflow.com/a/27532248/5520896也有助於我的原始問題。

那麼解決這個問題是從行家裝配插件移動到Maven插件樹蔭和應用變壓器

<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> 

所以我最後pom.xml的插件配置如下:

 <plugin> 
      <artifactId>maven-shade-plugin</artifactId> 
      <version>2.4.1</version> 
      <executions> 
       <execution> 
        <phase>package</phase> 
        <goals> 
         <goal>shade</goal> 
        </goals> 
        <configuration> 
         <createDependencyReducedPom>false</createDependencyReducedPom> 

         <filters> 
          <filter> 
           <artifact>*:*</artifact> 
           <excludes> 
            <exclude>META-INF/*.SF</exclude> 
            <exclude>META-INF/*.DSA</exclude> 
            <exclude>META-INF/*.RSA</exclude> 
           </excludes> 
          </filter> 
         </filters> 
         <transformers>         
          <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> 
         </transformers> 
        </configuration> 
       </execution> 
      </executions> 
     </plugin> 

顯然這裏解釋了Maven組件出了什麼問題:https://stackoverflow.com/a/21118824/5520896