2013-06-06 49 views
3

您好,我試圖在Mahout中運行KmeanClustering示例,因示例代碼中的錯誤而陷入困境。我在下面的代碼中看到錯誤snipet無法實例化Mahout中的類型Cluster,KMean羣集示例

Cluster cluster = new Cluster(vec,i,new EuclideanDistanceMeasure());

它給出了一個錯誤

無法實例的類型集羣

(這是一個界面,我的理解)。我想在我的樣本數據集運行K均值,任何人都可以引導我在那也是。

我已經包含在我的Eclipse IDE中的以下JAR

象夫 - 數學0.7 cdh4.3.0.jar

Hadoop的共同-2.0.0-cdh4.2.1.jar

Hadoop的HDFS-2.0.0-cdh4.2.1.jar

Hadoop的MapReduce的客戶端 - 芯2.0.0-cdh4.2.1.jar

象夫核-0.7-cdh4.3.0.jar

檢查,如果我錯過任何重要的罐子,我將運行這在Hadoop CDH4.2.1

這裏附上我的整個代碼,從Github

package tryout; 

import java.io.File; 
import java.io.IOException; 
import java.util.ArrayList; 
import java.util.List; 

import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.FileSystem; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.SequenceFile; 
import org.apache.hadoop.io.Text; 
import org.apache.mahout.math.RandomAccessSparseVector; 
import org.apache.mahout.math.Vector; 
import org.apache.mahout.math.VectorWritable; 
import org.apache.mahout.clustering.Cluster; 
import org.apache.mahout.clustering.classify.WeightedVectorWritable; 
import org.apache.mahout.clustering.kmeans.KMeansDriver; 
import org.apache.mahout.common.distance.EuclideanDistanceMeasure; 

public class SimpleKMeansClustering { 
    public static final double[][] points = { {1, 1}, {2, 1}, {1, 2}, 
               {2, 2}, {3, 3}, {8, 8}, 
               {9, 8}, {8, 9}, {9, 9}};  


    public static void writePointsToFile(List<Vector> points, 
      String fileName,FileSystem fs,Configuration conf) throws IOException {  
     Path path = new Path(fileName);  
     SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,path, LongWritable.class, VectorWritable.class);  

     long recNum = 0;  
     VectorWritable vec = new VectorWritable();  
     for (Vector point : points) {  
     vec.set(point);  
      writer.append(new LongWritable(recNum++), vec);  
     } writer.close(); 
    }  

    public static List<Vector> getPoints(double[][] raw) {  
     List<Vector> points = new ArrayList<Vector>();  
     for (int i = 0; i < raw.length; i++) {  
      double[] fr = raw[i];  
      Vector vec = new RandomAccessSparseVector(fr.length);  
      vec.assign(fr);  
      points.add(vec);  
     }  
     return points; 
    }  
    public static void main(String args[]) throws Exception {   
     int k = 2;   
     List<Vector> vectors = getPoints(points);   
     File testData = new File("testdata");  
     if (!testData.exists()) {  
      testData.mkdir();  
     }  
     testData = new File("testdata/points");  
     if (!testData.exists()) {  
      testData.mkdir();  
     }   
     Configuration conf = new Configuration();  
     FileSystem fs = FileSystem.get(conf);  
     writePointsToFile(vectors, "testdata/points/file1", fs, conf);   
     Path path = new Path("testdata/clusters/part-00000");  
     SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,path, Text.class, Cluster.class); 
     for (int i = 0; i < k; i++) {  
      Vector vec = vectors.get(i);  
      Cluster cluster = new Cluster(vec, i, new EuclideanDistanceMeasure());  
      writer.append(new Text(cluster.getIdentifier()), cluster);  
     }  
     writer.close();   


     KMeansDriver.run(conf, new Path("testdata/points"), new Path("testdata/clusters"),  
       new Path("output"), new EuclideanDistanceMeasure(), 0.001, 10, 
       true, false);   
     SequenceFile.Reader reader = new SequenceFile.Reader(fs,new Path("output/" + Cluster.CLUSTERED_POINTS_DIR+ "/part-m-00000"), conf);   
     IntWritable key = new IntWritable(); 
     WeightedVectorWritable value = new WeightedVectorWritable();  
     while (reader.next(key, value)) {  
      System.out.println(value.toString() + " belongs to cluster " + key.toString());  
     }  
     reader.close(); 
    } 
} 

採取同樣引導我,如果我有我自己的數據集如何處理。

+0

鏈接更新的代碼如果我沒有記錯,你需要的類'Kluster'。 –

+0

謝謝托馬斯我做了改變,現在我可以用適當的Class文件和jar文件編譯代碼。 – user2454360

回答

3

我也一直試圖從Mahout in Action書籍中做出這個例子。我最終管理它。這是我做的:

SequenceFile.Writer writer= new SequenceFile.Writer(fs, conf, path, Text.class, Kluster.class); 
for (int i = 0; i < k; i++) { 
Vector vec = vectors.get(i); 
Kluster cluster = new Kluster(vec, i, new EuclideanDistanceMeasure()); 
writer.append(new Text(Kluster.getIdentifier()), cluster); 
} 

我不敢相信書中的代碼是不正確的。我也設法在不使用maven的情況下工作。我在這裏描述得比較完整,但基本上我是用用戶庫做的:Using mahout in eclipse WITHOUT USING MAVEN

更新:好的,這本書的內容沒有錯,但很老。這個頁面有從書中

http://alexott.blogspot.co.uk/2012/07/getting-started-with-examples-from.html