2015-11-08 36 views
0

我有一個基本的spark mllib程序,如下所示。未找到org.apache.spark.sql.types.SQLUserDefinedType類 - 繼續存根

import org.apache.spark.mllib.clustering.KMeans 

import org.apache.spark.SparkContext 
import org.apache.spark.SparkConf 
import org.apache.spark.mllib.linalg.Vectors 


class Sample { 
    val conf = new SparkConf().setAppName("helloApp").setMaster("local") 
    val sc = new SparkContext(conf) 
    val data = sc.textFile("data/mllib/kmeans_data.txt") 
    val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache() 

    // Cluster the data into two classes using KMeans 
    val numClusters = 2 
    val numIterations = 20 
    val clusters = KMeans.train(parsedData, numClusters, numIterations) 

    // Export to PMML 
    println("PMML Model:\n" + clusters.toPMML) 
} 

我已經通過手動的IntelliJ都具有1.5.0版本添加spark-corespark-mllibspark-sql到項目類路徑。

我在運行程序時遇到下面的錯誤?任何想法有什麼不對?

Error:scalac: error while loading Vector, Missing dependency 'bad symbolic reference. A signature in Vector.class refers to term types in package org.apache.spark.sql which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling Vector.class.', required by /home/fazlann/Downloads/spark-mllib_2.10-1.5.0.jar(org/apache/spark/mllib/linalg/Vector.class

+0

你是什麼意思的「手動添加」? –

+0

我使用模塊設置選項將jar添加到類路徑 – DesirePRG

回答

1

DesirePRG。我遇到了和你一樣的問題。解決方法是導入一些組裝火花和哈多普的罐子,如spark-assembly-1.4.1-hadoop2.4.0.jar,那麼它可以正常工作。