2017-04-14 68 views
1

我想用java來做一個巨大文件的字數統計。由於單臺機器,我無法使用地圖縮小功能。我不想使用哈希映射,而是使用Redis來存儲單詞的頻率。實際數據正在流入。使用Redis的一個巨大文件的字數的最佳解決方案

我以爲我會推動在redis排序集每個字計數。但我不知道這是最佳解決方案。請提供最佳解決方案來統計流數據的字數。

一個字Java代碼計數 -

public class WordCount { 
    public static void main(String args[]) { 
     Map<String, Integer> wordMap = wordMap("filename"); 
     List<Entry<String, Integer>> list = sortByValue(wordMap); 
     for (Map.Entry<String, Integer> entry : list) { 
      System.out.println(entry.getKey() + " => " + entry.getValue()); 

     } 
    } 

    public static Map<String, Integer> wordMap(String fileName) { 
     Map<String, Integer> wordMap = new HashMap<>(); 
     try (FileInputStream fis = new FileInputStream(fileName); 
       DataInputStream dis = new DataInputStream(fis); 
       BufferedReader br = new BufferedReader(new InputStreamReader(dis))) { 
      // words are separated by whitespace 
      Pattern pattern = Pattern.compile("\\s+"); 
      String line = null; 
      while ((line = br.readLine()) != null) { 
       line = line.toLowerCase(); 
       String[] words = pattern.split(line); 
       for (String word : words) { 
        if (wordMap.containsKey(word)) { 
         wordMap.put(word, (wordMap.get(word) + 1)); 
        } else { 
         wordMap.put(word, 1); 
        } 
       } 
      } 
     } catch (IOException ioex) { 
      ioex.printStackTrace(); 
     } 
     return wordMap; 
    } 

    public static List<Entry<String, Integer>> sortByValue(Map<String, Integer> wordMap) { 
     Set<Entry<String, Integer>> entries = wordMap.entrySet(); 
     List<Entry<String, Integer>> list = new ArrayList<>(entries); 
     Collections.sort(list, new Comparator<Map.Entry<String, Integer>>() { 

      @Override 
      public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) 

      { 
       return (o2.getValue()).compareTo(o1.getValue()); 
      } 
     }); 
     return list; 
    } 
} 

回答

0

有是如何執行的Map Reduce了Java的使用Redis的Redisson數據的良好example