Secondary排序Hadoop

我正在做一個hadoop項目，經過多次訪問各種博客和閱讀文檔後，我意識到我需要使用hadoop框架提供的secondry排序功能。Secondary排序Hadoop

我的輸入格式的形式爲：

DESC(String) Price(Integer) and some other Text

我想在減速值是降價格秩序。同時比較DESC我有一個方法需要兩個字符串和一個百分比，如果兩個字符串之間的相似性等於或大於百分比，那麼我應該認爲它們是相等的。

問題是Reduce作業完成後我可以看到一些類似於其他字符串的DESC，但它們在不同的組中。

以下是分組的複合鍵

public int compareTo(VendorKey o) { 
    int result =- 
    result = compare(token, o.token, ":") >= percentage ? 0:1; 
    if (result == 0) { 
     return pid> o.pid ?-1: pid < o.pid ?1:0; 
    } 
    return result; 
}

和比較的方法我compareTo方法比較

public int compare(WritableComparable a, WritableComparable b) { 
    VendorKey one = (VendorKey) a; 
    VendorKey two = (VendorKey) b; 
    int result = ClusterUtil.compare(one.getToken(), two.getToken(), ":") >= one.getPercentage() ? 0 : 1; 
    // if (result != 0) 
    // return two.getToken().compareTo(one.getToken()); 
    return result; 
}

來源

2016-08-04 Abhishek Singh

修復了compareTo方法嗎？ – aventurin

看來你compareTo方法違反了共同contract需要sgn(x.compareTo(y))等於-sgn(y.compareTo(x)) 。

來源

2016-08-06 17:32:13 aventurin

在您的customWritable之後，給一個基本分區程序提供組合鍵和NullWritable值。例如：

public class SecondarySortBasicPartitioner extends 
    Partitioner<CompositeKeyWritable, NullWritable> { 

    public int getPartition(CompositeKeyWritable key, NullWritable value, 
      int numReduceTasks) { 

     return (key.DEPT().hashCode() % numReduceTasks); 
    } 
}

然後，在指定鍵排序比較器和2個compositeKeyWritable變量之後，將完成分組。

來源

2017-04-22 15:54:47

有洗牌過程中3個步驟：分區，排序和分組。我想你有多個reducer，你的類似結果由不同的reducer處理，因爲他們在不同的分區。

您可以將reducer的數量設置爲1或設置一個自定義的分區程序，它可以爲您的作業擴展org.apache.hadoop.mapreduce.Partitioner。

來源

2018-01-10 09:51:53 Harper

Secondary排序Hadoop

回答

相關問題