2017-12-03 123 views
1

我想出來的火花簡單NGRAM例如

https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaNGramExample.java

這是我的POM依賴

<dependencies> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-core_2.11</artifactId> 
     <version>2.2.0</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-mllib_2.10</artifactId> 
     <version>2.2.0</version> 
    </dependency> 
</dependencies> 

下面的示例代碼

public class App { 
    public static void main(String[] args) { 
     System.out.println("Hello World!"); 

     System.setProperty("hadoop.home.dir", "D:\\del"); 

     SparkSession spark = SparkSession 
        .builder() 
        .appName("JavaNGramExample").config("spark.master", "local") 
        .getOrCreate(); 


     List<Row> data = Arrays.asList(RowFactory.create(0, Arrays.asList("car", "killed", "cat")), 
        RowFactory.create(1, Arrays.asList("train", "killed", "cat")), 
        RowFactory.create(2, Arrays.asList("john", "plays", "cricket")), 
        RowFactory.create(3, Arrays.asList("tom", "likes", "mangoes"))); 


     StructType schema = new StructType(new StructField[] { 
       new StructField("id", DataTypes.IntegerType, false, Metadata.empty()), 
       new StructField("words", DataTypes.createArrayType(DataTypes.StringType), false, Metadata.empty()) }); 

     Dataset<Row> wordDataFrame = spark.createDataFrame(data, schema); 

     NGram ngramTransformer = new NGram().setN(2).setInputCol("words").setOutputCol("ngrams"); 

     Dataset<Row> ngramDataFrame = ngramTransformer.transform(wordDataFrame); 
     System.out.println(" DISPLAY NGRAMS "); 
     ngramDataFrame.select("ngrams").show(false); 


    } 
} 

我提示以下錯誤:當我運行這個代碼。

Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class 
    at org.apache.spark.sql.types.StructType.<init>(StructType.scala:98) 
    at com.mypackage.spark.learnspark.App.main(App.java:61) 
Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
    ... 2 more 

我查階的依賴,這是斯卡拉庫-2.11.8

是否有火花2.2.0和我的斯卡拉罐子有不一致之處?

回答

1

TL;博士變化spark-mllib_2.10spark-mllib_2.11到這樣的Scala 2.11.8用於火花MLlib依賴性(以及可選地去除spark-core_2.11依賴性)。


請參閱pom.xml

<dependencies> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-core_2.11</artifactId> 
     <version>2.2.0</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-mllib_2.10</artifactId> 
     <version>2.2.0</version> 
    </dependency> 
</dependencies> 
  1. spark-core_2.11從星火2.2.0取決於斯卡拉2.11.8,這就是確定。

  2. spark-mllib_2.10來自Spark 2.2.0取決於兩個不同和不兼容的斯卡拉版本2.10.x2.11.8。這是問題的根源。

確保使用:

  1. 同樣的後綴爲你的星火依賴artifactId,即spark-core_2.11spark-mllib_2.11(注意,我把它改成2.11)。

  2. 在每個Spark依賴項中都有相同的version

相關問題