配置單元如何實現計數（獨特...）？

在GenericUDAFCount.java：配置單元如何實現計數（獨特...）？

@Description(name = "count", 
value = "_FUNC_(*) - Returns the total number of retrieved rows, including " 
     +  "rows containing NULL values.\n" 

     + "_FUNC_(expr) - Returns the number of rows for which the supplied " 
     +  "expression is non-NULL.\n" 

     + "_FUNC_(DISTINCT expr[, expr...]) - Returns the number of rows for " 
     +  "which the supplied expression(s) are unique and non-NULL.")

，但我不`噸看到任何代碼來處理「不同的」表情。

public static class GenericUDAFCountEvaluator extends GenericUDAFEvaluator { 
private boolean countAllColumns = false; 
private LongObjectInspector partialCountAggOI; 
private LongWritable result; 

@Override 
public ObjectInspector init(Mode m, ObjectInspector[] parameters) 
throws HiveException { 
    super.init(m, parameters); 
    partialCountAggOI = 
    PrimitiveObjectInspectorFactory.writableLongObjectInspector; 
    result = new LongWritable(0); 
    return PrimitiveObjectInspectorFactory.writableLongObjectInspector; 
} 

private GenericUDAFCountEvaluator setCountAllColumns(boolean countAllCols) { 
    countAllColumns = countAllCols; 
    return this; 
} 

/** class for storing count value. */ 
static class CountAgg implements AggregationBuffer { 
    long value; 
} 

@Override 
public AggregationBuffer getNewAggregationBuffer() throws HiveException { 
    CountAgg buffer = new CountAgg(); 
    reset(buffer); 
    return buffer; 
} 

@Override 
public void reset(AggregationBuffer agg) throws HiveException { 
    ((CountAgg) agg).value = 0; 
} 

@Override 
public void iterate(AggregationBuffer agg, Object[] parameters) 
    throws HiveException { 
    // parameters == null means the input table/split is empty 
    if (parameters == null) { 
    return; 
    } 
    if (countAllColumns) { 
    assert parameters.length == 0; 
    ((CountAgg) agg).value++; 
    } else { 
    assert parameters.length > 0; 
    boolean countThisRow = true; 
    for (Object nextParam : parameters) { 
     if (nextParam == null) { 
     countThisRow = false; 
     break; 
     } 
    } 
    if (countThisRow) { 
     ((CountAgg) agg).value++; 
    } 
    } 
} 

@Override 
public void merge(AggregationBuffer agg, Object partial) 
    throws HiveException { 
    if (partial != null) { 
    long p = partialCountAggOI.get(partial); 
    ((CountAgg) agg).value += p; 
    } 
} 

@Override 
public Object terminate(AggregationBuffer agg) throws HiveException { 
    result.set(((CountAgg) agg).value); 
    return result; 
} 

@Override 
public Object terminatePartial(AggregationBuffer agg) throws HiveException { 
    return terminate(agg); 
}

}

如何蜂巢實現count(distinct ...)？當任務運行時，它確實花費了很多時間。源代碼在哪裏？

來源

2012-07-02 ceys

正如你可以運行SELECT DISTINCT列1 FROM表1，不同的表達不是一個標誌或選項，它的獨立評估

This page說：

勢必參數類型數據的實際過濾DISTINCT 的實現由框架處理，而不是COUNT UDAF 實現。

如果你想深入到源代碼的細節，看看到hive git repository

來源

2012-09-05 03:26:23

配置單元如何實現計數（獨特...）？

回答

相關問題