在MapReduce中如何在處理X記錄後停止Reducer

我正在使用Mapper加載具有執行時間和大量查詢的大量數據..我只需要查找1000個最昂貴的查詢，所以我將執行時間作爲映射器輸出的關鍵字。我使用1個減速器，只需要寫入1000條記錄，減速器停止處理。在MapReduce中如何在處理X記錄後停止Reducer

我可以有一個全球性的櫃檯和做如果（計數< 1000）{ context.write（鍵，值） }

但是，這仍然會加載記錄所有的數十億美元，然後不寫他們。

我想要減速機在吐出1000條記錄後停止。通過避免尋找下一組記錄的時間和讀取時間。

這可能嗎？

public void run(Context context) throws IOException, InterruptedException { 
    setup(context); 
    while (context.nextKey()) { 
    reduce(context.getCurrentKey(), context.getValues(), context); 
    } 
    cleanup(context); 
}

你應該能夠修改while循環，包括您的計數器如下：

來源

2013-06-24 mm93rc213v

您可以完全重寫Reducer.run()方法的默認實現快捷方式減速

public void run(Context context) throws IOException, InterruptedException { 
    setup(context); 
    int count = 0; 
    while (context.nextKey() && count++ < 1000) { 
    reduce(context.getCurrentKey(), context.getValues(), context); 
    } 
    cleanup(context); 
}

不是說這不一定會輸出最上面的記錄，僅僅是前1000個鍵控記錄（如果您的reduce實現輸出的是更多的單個記錄，則不會工作 - 在這種情況下，您可以使用reduce方法增加計數器）

來源

2013-06-25 00:03:13

真棒工作..謝謝... – mm93rc213v

在MapReduce中如何在處理X記錄後停止Reducer

回答

相關問題