Hadoop MapReduce：返回文本文件中單詞的排序列表

所以我的任務是返回一個包含在文本文件中的所有單詞的排序列表，同時保留重複。Hadoop MapReduce：返回文本文件中單詞的排序列表

{生存還是毀滅} - →{是不或向}

我的想法是把每個單詞爲重點，以及價值。這樣，因爲hadoop對鍵進行排序，它們將自動按字母順序排序。在Reduce階段，我只需將具有相同鍵（所以基本上相同的單詞）的所有單詞附加到單個文本值。

public class WordSort { 

    public static class Map extends Mapper<LongWritable, Text, Text, Text> { 

    private Text word = new Text(); 

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 
     String line = value.toString(); 
     StringTokenizer tokenizer = new StringTokenizer(line); 
     while (tokenizer.hasMoreTokens()) { 
     word.set(tokenizer.nextToken()); 
     // transform to lower case 
     String lower = word.toString().toLowerCase(); 
     context.write(new Text(lower), new Text(lower)); 
     } 
    } 
    } 

    public static class Reduce extends Reducer<Text, Text, Text, Text> { 

    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { 
     String result = ""; 
     for (Text value : values){ 
     res += value.toString() + " "; 
     } 
     context.write(key, new Text(result)); 
    } 
    }

但我的問題是，如何我只是在我的輸出文件返回值？目前，我有這樣的：

be be be 
not not 
or or 
to to to

因此，在每一行我有鑰匙，然後再價值觀，但我只想讓我得到返回的值是：

be be 
not 
or 
to to

是這甚至可能或者我必須從每個單詞的值中刪除一個條目？

來源

2012-11-03 gaussd

聲明：我不是Hadoop用戶，但我用CouchDB做了很多Map/Reduce。

如果你只是需要鑰匙，爲什麼你不發出一個空值？

此外，它聽起來像你不想減少它們，因爲你想獲得每一個事件的關鍵。

來源

2012-11-03 10:31:17

哦，我覺得只是一個冒落空值是顯而易見的解決方案：d！是的，用maprecude解決這個任務對我來說似乎也很奇怪......但我沒有創造它......我的老師做到了。 – gaussd

確實有很多情況下，您只使用Map/Reduce的「地圖」部分... –

只是試圖與Hadoop的的MaxTemperature例子 - 權威指南和下面的代碼工作

context.write(null, new Text(result));

來源

2012-11-03 11:18:53

那麼這將是什麼類型？ NullWritable？ – gaussd

had job.setOutputKeyClass（Text.class）;在代碼中。所以，它應該適用於任何可寫類型。 –

Hadoop MapReduce：返回文本文件中單詞的排序列表

回答

相關問題