2014-01-22 92 views
0

我在分佈式緩存中存儲了少量數據(幾MB),並使用它來執行與兩個大文件的反連接。對於緩存中的幾行數據,功能運行良好,但是當緩存在生產中有更多數據時,它無法完成這項工作,但它也不會拋出任何錯誤。只有少數記錄(大約20%)正在加入,而其他記錄只是被忽略。那麼在分佈式緩存中是否可以存儲任何記錄數量的上限?爲什麼它爲某些記錄工作並忽略其餘部分?任何建議都會非常有幫助。 貝婁是我的代碼分佈式緩存不工作

 public class MyMapper extends Mapper<LongWritable, Text, Text, TextPair> { 

      Text albumKey = new Text(); 
      Text photoKey = new Text(); 
      private HashSet<String> photoDeleted = new HashSet<String>(); 

      private HashSet<String> albDeleted = new HashSet<String>(); 
      Text interKey = new Text(); 
      private TextPair interValue = new TextPair(); 
      private static final Logger LOGGER = Logger.getLogger(SharedStreamsSlMapper.class); 

      protected void setup(Context context) throws IOException, InterruptedException { 
       int count=0; 
       Path[] cacheFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration()); 
       System.out.println(cacheFiles.length); 
       LOGGER.info(cacheFiles+"****"); 
       try { 
        if (cacheFiles != null && cacheFiles.length > 0) { 
         for (Path path : cacheFiles) { 
          String line; 
          String[] tokens; 

          BufferedReader joinReader = new BufferedReader(new FileReader(path.toString())); 
          System.out.println(path.toString()); 
     //     BufferedReader joinReader = new BufferedReader(new FileReader("/Users/Kunal_Basak/Desktop/ss_test/dsitCache/part-m-00000")); 
          try { 
           while ((line = joinReader.readLine()) != null) { 
            count++; 
            tokens = line.split(SSConstants.TAB, 2); 
            if(tokens.length<2){ 
             System.out.println("WL"); 
             continue; 
            } 
            if (tokens[0].equals("P")) { 
             photoDeleted.add(tokens[1]); 
            } 
            else if (tokens[0].equals("A")) { 
             albDeleted.add(tokens[1]); 
            } 
           } 
          } 
          finally { 
           joinReader.close(); 
          } 
         } 
        } 
       } 
       catch (IOException e) { 
        System.out.println("Exception reading DistributedCache: " + e); 
       } 
       System.out.println(count); 
       System.out.println("albdeleted *****"+albDeleted.size()); 
       System.out.println("photo deleted *****"+photoDeleted.size()); 
       LOGGER.info("albdeleted *****"+albDeleted.size()); 
       LOGGER.info("albdeleted *****"+albDeleted.size()); 
      } 

      public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 
     try{ 
    //my mapper code 
    } 
    } 
    } 
+0

當您打印出緩存文件的數量時,是否如您期望的那樣,還是丟失了一些文件? – DNA

回答

0

根據這一blog article

local.cache.size參數控制 DistributedCache的大小。

默認情況下,它被設置爲10 GB。

因此,如果緩存中有10GB以上,那可能是您的問題。