2
有一個字計數的例子在它的教程:字數上的HBase表的JavaPairRDD
JavaRDD<String> textFile = spark.textFile("hdfs://...");
JavaRDD<String> words = textFile.flatMap(new FlatMapFunction<String, String>() {
public Iterable<String> call(String s) { return Arrays.asList(s.split(" ")); }
});
JavaPairRDD<String, Integer> pairs = words.mapToPair(new PairFunction<String, String, Integer>() {
public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); }
});
JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new Function2<Integer, Integer, Integer>() {
public Integer call(Integer a, Integer b) { return a + b; }
});
counts.saveAsTextFile("hdfs://...");
不過,我已經有一個JavaPairRDD作爲words
而不是JavaRDD像:
JavaPairRDD<String, WebPage> myRDD
,並希望字數(這是從Hbase數據庫檢索)
那麼,我該如何做字數呢?