如何實現hadoop的組比較器？

給定一個名爲KeyLabelDistance的類，我將其作爲Hadoop中的鍵和值傳遞，我想對它進行二級排序，即我首先要根據鍵值的遞增值對鍵進行排序，然後按照DECREASING順序距離。如何實現hadoop的組比較器？

爲了做到這一點，我需要編寫自己的GroupingComparator.My問題是因爲setGroupingComparator（）方法僅將一個擴展RawComparator的類作爲參數，我該如何在分組比較器中進行比較的字節？我是否需要顯式序列化和反序列化對象？還有如下的類KeyLabelDistance實現WritableComparable需要一個SortComparator作爲冗餘嗎？

我得到了這個答案使用SortComparator和GroupComparator的：What are the differences between Sort Comparator and Group Comparator in Hadoop?

以下是KeyLabelDistance執行：

public class KeyLabelDistance implements WritableComparable<KeyLabelDistance> 
    { 
     private int key; 
     private int label; 
     private double distance; 
     KeyLabelDistance() 
     { 
      key = 0; 
      label = 0; 
      distance = 0; 
     } 
     KeyLabelDistance(int key, int label, double distance) 
     { 
      this.key = key; 
      this.label = label; 
      this.distance = distance; 
     } 
     public int getKey() { 
      return key; 
     } 
     public void setKey(int key) { 
      this.key = key; 
     } 
     public int getLabel() { 
      return label; 
     } 
     public void setLabel(int label) { 
      this.label = label; 
     } 
     public double getDistance() { 
      return distance; 
     } 
     public void setDistance(double distance) { 
      this.distance = distance; 
     } 

     public int compareTo(KeyLabelDistance lhs, KeyLabelDistance rhs) 
     { 
      if(lhs == rhs) 
       return 0; 
      else 
      { 
       if(lhs.getKey() < rhs.getKey()) 
        return -1; 
       else if(lhs.getKey() > rhs.getKey()) 
        return 1; 
       else 
       { 
        //If the keys are equal, look at the distances -> since more is the "distance" more is the "similarity", the comparison is counterintuitive 
        if(lhs.getDistance() < rhs.getDistance()) 
         return 1; 
        else if(lhs.getDistance() > rhs.getDistance()) 
         return -1; 
        else return 0; 
       } 
      } 
     } 
    }

該組比較器的代碼如下：

public class KeyLabelDistanceGroupingComparator extends WritableComparator{ 
    public int compare (KeyLabelDistance lhs, KeyLabelDistance rhs) 
    { 
     if(lhs == rhs) 
      return 0; 
     else 
     { 
      if(lhs.getKey() < rhs.getKey()) 
       return -1; 
      else if(lhs.getKey() > rhs.getKey()) 
       return 1; 
      return 0; 
     } 
    } 
}

任何幫助表示讚賞。提前感謝。

來源

2014-04-02 user3377770

您可以擴展WritableComparator，它繼而實現RawComparator。你的排序&分組比較器將擴展WritableComparator。

如果您沒有提供這些比較器，hadoop會在內部最終使用可寫的compareTo，這是您的密鑰。

來源

2014-04-02 20:36:16 Venkat

謝謝，我試過了。現在我已經包含組比較代碼也在我的問題，但我得到以下錯誤： KeyLabelDistanceGroupingComparator.java:3：找不到符號符號：構造函數WritableComparator（）位置：類org.apache.hadoop.io.WritableComparator – user3377770

當tou在java中擴展一個類並且超類沒有默認構造函數時，這就是你得到的錯誤。在代碼中創建構造函數並調用super（）。例如：\t XYZKeyValueComparator（）{ \t \t super（MyWritable.class，true）; \t} – Venkat

如何實現hadoop的組比較器？

回答

相關問題