維護主節點中的數據結構

我已經在一些數據上編寫了MR算法來創建數據結構。創建後，我需要回答一些疑問。爲了更快地回答這些查詢，我從結果中創建了一個元數據（大約幾MB）。維護主節點中的數據結構

現在我的問題是這樣的：

是否有可能創建元數據在主節點的內存，以避免文件I/O，結果答案查詢速度更快？

來源

2016-04-12 AKJ88

你是什麼意思創建一個數據結構？當你說查詢時，你的意思是你要爲查詢運行MR作業嗎？請說明情況。 –

想象一下，您可以在內存中將B-Tree指向HDFS上的數據文件。對於查詢，您可以參考B-Tree來訪問某些數據文件，然後運行MR作業。 – AKJ88

假設，根據對其他答案的OP響應，元數據將被用於另一個MR作業。在這種情況下使用分佈式緩存是相當容易：

在驅動程序類：

public class DriverClass extends Configured{ 

    public static void main(String[] args) throws Exception { 

    /* ...some init code... */ 


    /* 
    * Instantiate a Job object for your job's configuration. 
    */ 
    Configuration job_conf = new Configuration(); 
    DistributedCache.addCacheFile(new Path("path/to/your/data.txt").toUri(),job_conf); 
    Job job = new Job(job_conf); 

    /* ... configure and start the job... */ 

    } 
}

在映射器類，你可以在設置階段讀取數據，並使其可用於地圖類：

public class YourMapper extends Mapper<LongWritable, Text, Text, Text>{ 

    private List<String> lines = new ArrayList<String>(); 

    @Override 
    protected void setup(Context context) throws IOException, 
     InterruptedException { 

    /* Get the cached archives/files */ 
    Path[] cached_file = new Path[0]; 
    try { 
     cached_file = DistributedCache.getLocalCacheFiles(context.getConfiguration()); 
    } catch (IOException e1) { 
     // TODO add error code 
     e1.printStackTrace(); 
    } 
    File f = new File (cached_file[0].toString()); 
    try { 
     /* Read the data some thing like: */ 
     lines = Files.readLines(f,charset); 
    } catch (IOException e) { 

     e.printStackTrace(); 
    } 
    } 


    @Override 
    public void map(LongWritable key, Text value, Context context) 
     throws IOException, InterruptedException { 

     /* 
     * In the mapper - use the data as needed 
     */ 

    } 
}

請注意，分佈式緩存可以容納更多的純文本文件。您可以使用檔案（zip，tar ..）甚至是完整的java類（jar文件）。

另請注意，在較新的Hadoop實現中，分佈式緩存API可在Job類本身中找到。請參閱this API和this answer。

來源

2016-04-23 18:44:18

維護主節點中的數據結構

回答

相關問題