Hadoop結果搞砸了

Hadoop新手在這裏。我想計算文本中每行的單詞共同出現次數，例如一個單詞與其他單詞出現在同一行中的次數。爲此，我創建了一對特殊的單詞對，所以MapReduce會給我一雙單詞，然後給我一個單詞。事情是，結果搞砸了，我不知道我錯在哪裏。Hadoop結果搞砸了

我的話對類是這樣的：

public class Par implements Writable,WritableComparable<Par> { 

    public String palabra; 
    public String vecino; 

    public Par(String palabra, String vecino) { 
     this.palabra = palabra; 
     this.vecino = vecino; 
    } 

    public Par() { 
     this.palabra = new String(); 
     this.vecino = new String(); 
    } 

    @Override 
    public int compareTo(Par otra) { 
     int retorno = this.palabra.compareTo(otra.palabra); 
     if(retorno != 0){ 
      return retorno; 
     } 
     return this.vecino.compareTo(otra.vecino); 
    } 

    @Override 
    public void write(DataOutput out) throws IOException { 
     out.writeUTF(palabra); 
     out.writeUTF(vecino); 
    } 

    @Override 
    public void readFields(DataInput in) throws IOException { 
     palabra = in.readUTF(); 
     vecino = in.readUTF(); 
    } 

    @Override 
    public int hashCode() { 
     final int prime = 31; 
     int result = 1; 
     result = prime * result + ((palabra == null) ? 0 : palabra.hashCode()); 
     result = prime * result + ((vecino == null) ? 0 : vecino.hashCode()); 
     return result; 
    } 

    @Override 
    public boolean equals(Object obj) { 
     if (this == obj) 
      return true; 
     if (obj == null) 
      return false; 
     if (getClass() != obj.getClass()) 
      return false; 
     Par other = (Par) obj; 
     if (palabra == null) { 
      if (other.palabra != null) 
       return false; 
     } else if (!palabra.equals(other.palabra)) 
      return false; 
     if (vecino == null) { 
      if (other.vecino != null) 
       return false; 
     } else if (!vecino.equals(other.vecino)) 
      return false; 
     return true; 
    } 

    @Override 
    public String toString() { 
     return "Par [" + palabra + " , " + vecino + "]"; 
    } 


}

我的映射是：

public class Co_OcurrenciaMapper extends Mapper<LongWritable, Text, Par, IntWritable> { 
    @Override 
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 
     IntWritable one = new IntWritable(1); 
     String[] palabras = value.toString().split("\\s+"); 
     if (palabras.length > 1) { 
      for (int i = 0; i < palabras.length - 1; i++) { 
       for (int j = i + 1; j < palabras.length; j++) { 
        context.write(new Par(palabras[i], palabras[j]), one); 
       } 
      } 
     } 
    } 
}

而結果我的MapReduce得到的是：

[[email protected] Desktop]$ hadoop fs -cat salidaO11/part-r-00000 |head -15 
Par [ , &c.] 35 
Par [ , &c.'] 2 
Par [ , &c.,] 4 
Par [ , &c]] 23 
Par [ , '] 6 
Par [ , ''Od's] 1 
Par [ , ''Tis] 2 
Par [ , ''tis] 1 
Par [ , ''twas] 1 
Par [ , '--O] 1 
Par [ , 'A] 17 
Par [ , 'ARTEMIDORUS.'] 1 
Par [ , 'Above] 1 
Par [ , 'Achilles] 2 
Par [ , 'Ad] 3 
cat: Unable to write to output stream.

我在哪裏錯了？一位朋友建議將這兩個單詞合併爲一個字符串，但我認爲它並不那麼優雅。

來源

2016-03-24 Gustavo Moreno

既然這個問題已經得到解答，我希望你正在研究MR，以瞭解它是如何工作的。否則，你應該看看Spark。這整個工作可以寫成3行代碼：'val input = sc.textFile（「s3：// ...」） val words = input.flatMap（x => x.split（「」）） val result = words.map（x =>（x，1））。reduceByKey（（x，y）=> x + y）' – Havnar

準確地說，我正在學習MapReduce，所以這更像是一種練習，生產。 –

我不認爲這裏有什麼問題。您似乎沒有任何數據清理，所以我認爲這會產生這樣的骯髒輸出是公平的。

您可能想要嘗試編寫一些MRUnit測試，或將一些較小的清潔數據集輸入到作業中，以確認它符合您的期望？

來源

2016-03-24 14:28:05

文本文件非常乾淨。我已經將它用於其他MapReduce任務，沒有任何問題。不過，我會寫MRUnit測試。謝謝！ –

Hadoop結果搞砸了

回答

相關問題