在Hadoop DistributedCache上存儲TreeSet

我試圖在DistributedCache上存儲TreeSet以供Hadoop map-reduce作業使用。到目前爲止，我已經在添加從HDFS文件到DistributedCache如下：在Hadoop DistributedCache上存儲TreeSet

Configuration conf = new Configuration(); 
DistributedCache.addCacheFile(new URI("/my/cache/path"), conf); 
Job job = new Job(conf, "my job"); 
// Proceed with remainder of Hadoop map-reduce job set-up and running

如何有效地增加一個TreeSet（我已經在此建類）這個文件，我增加了DistributedCache ？我應該使用Java的本地串行化以某種方式將其序列化到文件上嗎？

請注意，TreeSet是在啓動map-reduce作業的主類中構建的。 TreeSet永遠不會被修改，我只是希望每個映射器都具有對此TreeSet的只讀訪問權限，而無需一遍又一遍地重建它。

來源

2013-04-21 socoho

序列化TreeSet似乎是方法。在這種情況下，你不需要創建一個HashMap。只需從文件中反序列化TreeSet，並使用這些方法根據密鑰進行搜索。我喜歡這種方法。

來源

2013-04-22 03:16:58 Rags

在Hadoop DistributedCache上存儲TreeSet

回答

相關問題