2016-11-20 87 views
0

我創建了兩個包含來自兩個單獨的txt文件的字符串的HashMap。現在比較兩個HashMap並計數重複值的數量

,我想比較兩個包含HashMap和計算每個文件都包含重複值的數量。例如,如果file1和file2都包含字符串「hello」兩次,我的控制檯應該打印:你好發生2次。

這是我的第一個HashMap的:

List<String> word_list = new ArrayList<>(); 
     //Load your words to the word_list here 


     while (INPUT_TEXT1.hasNext()) { 
      String input_word = INPUT_TEXT1.next(); 

      word_list.add(input_word); 

     } 

     INPUT_TEXT1.close(); 

     String regexPattern = "[^a-zA-Z]"; 

     int index = 0; 

     for (String s : word_list) { 

      word_list.set(index++, s.replaceAll(regexPattern, "").toLowerCase()); 
     } 

     //Find the unique words now from list 
     String[] uniqueWords = word_list.stream().distinct(). 
             toArray(size -> new String[size]); 
     Map<String, Integer> wordsMap = new HashMap<>(); 
     int frequency = 0; 

     //Load the words to Map with each uniqueword as Key and frequency as Value 
     for (String uniqueWord : uniqueWords) { 
      frequency = Collections.frequency(word_list, uniqueWord); 
      System.out.println(uniqueWord+" occured "+frequency+" times"); 
      wordsMap.put(uniqueWord, frequency); 
     } 

     //Now, Sort the words with the reverse order of frequency(value of HashMap) 
     Stream<Entry<String, Integer>> topWords = wordsMap.entrySet().stream(). 
     sorted(Map.Entry.<String,Integer>comparingByValue().reversed()).limit(6); 

     //Now print the Top 5 words to console 
     System.out.println("Top 5 Words:::"); 
     topWords.forEach(System.out::println); 


     System.out.println("\n\n"); 

這是我第二次的HashMap:

List<String> wordList = new ArrayList<>(); 
     //Load your words to the word_list here 


     while (INPUT_TEXT2.hasNext()) { 
      String input_word1 = INPUT_TEXT2.next(); 

      wordList.add(input_word1); 

     } 

     INPUT_TEXT2.close(); 

     String regex = "[^a-zA-Z]"; 

     int index1 = 0; 

     for (String s : wordList) { 

      wordList.set(index1++, s.replaceAll(regex, "").toLowerCase()); 
     } 

     String[] uniqueWords1 = wordList.stream().distinct(). 
             toArray(size -> new String[size]); 
     Map<String, Integer> wordsMap1 = new HashMap<>(); 

     //Load the words to Map with each uniqueword as Key and frequency as Value 
     for (String uniqueWord : uniqueWords1) { 
      frequency = Collections.frequency(wordList, uniqueWord); 
      System.out.println(uniqueWord+" occured "+frequency+" times"); 
      wordsMap.put(uniqueWord, frequency); 
     } 

     //Now, Sort the words with the reverse order of frequency(value of HashMap) 
     Stream<Entry<String, Integer>> topWords1 = wordsMap1.entrySet().stream(). 
     sorted(Map.Entry.<String,Integer>comparingByValue().reversed()).limit(6) 

這是我原來的做法,以尋找重複值:

boolean val = wordsMap.keySet().containsAll(wordsMap1.keySet()); 

    for (Entry<String, Integer> str : wordsMap.entrySet()) { 
     System.out.println("================= " + str.getKey()); 


     if(wordsMap1.containsKey(str.getKey())){ 
      System.out.println("Map2 Contains Map 1 Key"); 
     } 
    } 

    System.out.println("================= " + val); 

有誰有沒有其他的建議來實現這個目標?謝謝

編輯 我怎麼能計算每個單獨值的出現次數?

+0

爲什麼你自己的代碼不工作? – ifly6

+3

哇!這是關於構建我所見過的字頻地圖的最糟糕實施。完整掃描列表以獲取唯一字詞,然後對每個唯一字詞*進行完整掃描。哎呀!由於您使用的Java 8流,請嘗試使用'流()收集(Collectors.groupingBy(W - > W,Collectors.counting()))。'。 – Andreas

+1

我把重點放在了最後一次檢查以爲OP是問如何改善它,我完全忽略了第一部分。我同意安德烈亞斯的觀點,第一部分應該完全重構。 – user6904265

回答

3

我覺得你的代碼工作爲好。如果你的目標是找到一個更好的方法來實現上次檢查,你可以試試這個:

Set<String> keySetMap1 = new HashSet<String>(wordsMap.keySet()); 
Set<String> keySet2 = wordsMap1.keySet(); 
keySetMap1.retainAll(keySet2); 
keySetMap1.stream().forEach(x -> System.out.println("Map2 Contains Map 1 Key: "+x)); 
+0

我怎麼會去計算每個重複的值出現的次數? – codeREXO

+0

爲了回答這個問題:我如何計算每個單獨值的出現次數,您可以按照Andreas的建議重構代碼:Map wordsMap = word_list.stream()。collect(Collectors .groupingBy(w - > w,Collectors.counting()));'用這一行你可以計算詞頻映射。希望我們回答您的所有問題。 – user6904265