我創建了兩個包含來自兩個單獨的txt文件的字符串的HashMap。現在比較兩個HashMap並計數重複值的數量
,我想比較兩個包含HashMap和計算每個文件都包含重複值的數量。例如,如果file1和file2都包含字符串「hello」兩次,我的控制檯應該打印:你好發生2次。
這是我的第一個HashMap的:
List<String> word_list = new ArrayList<>();
//Load your words to the word_list here
while (INPUT_TEXT1.hasNext()) {
String input_word = INPUT_TEXT1.next();
word_list.add(input_word);
}
INPUT_TEXT1.close();
String regexPattern = "[^a-zA-Z]";
int index = 0;
for (String s : word_list) {
word_list.set(index++, s.replaceAll(regexPattern, "").toLowerCase());
}
//Find the unique words now from list
String[] uniqueWords = word_list.stream().distinct().
toArray(size -> new String[size]);
Map<String, Integer> wordsMap = new HashMap<>();
int frequency = 0;
//Load the words to Map with each uniqueword as Key and frequency as Value
for (String uniqueWord : uniqueWords) {
frequency = Collections.frequency(word_list, uniqueWord);
System.out.println(uniqueWord+" occured "+frequency+" times");
wordsMap.put(uniqueWord, frequency);
}
//Now, Sort the words with the reverse order of frequency(value of HashMap)
Stream<Entry<String, Integer>> topWords = wordsMap.entrySet().stream().
sorted(Map.Entry.<String,Integer>comparingByValue().reversed()).limit(6);
//Now print the Top 5 words to console
System.out.println("Top 5 Words:::");
topWords.forEach(System.out::println);
System.out.println("\n\n");
這是我第二次的HashMap:
List<String> wordList = new ArrayList<>();
//Load your words to the word_list here
while (INPUT_TEXT2.hasNext()) {
String input_word1 = INPUT_TEXT2.next();
wordList.add(input_word1);
}
INPUT_TEXT2.close();
String regex = "[^a-zA-Z]";
int index1 = 0;
for (String s : wordList) {
wordList.set(index1++, s.replaceAll(regex, "").toLowerCase());
}
String[] uniqueWords1 = wordList.stream().distinct().
toArray(size -> new String[size]);
Map<String, Integer> wordsMap1 = new HashMap<>();
//Load the words to Map with each uniqueword as Key and frequency as Value
for (String uniqueWord : uniqueWords1) {
frequency = Collections.frequency(wordList, uniqueWord);
System.out.println(uniqueWord+" occured "+frequency+" times");
wordsMap.put(uniqueWord, frequency);
}
//Now, Sort the words with the reverse order of frequency(value of HashMap)
Stream<Entry<String, Integer>> topWords1 = wordsMap1.entrySet().stream().
sorted(Map.Entry.<String,Integer>comparingByValue().reversed()).limit(6)
這是我原來的做法,以尋找重複值:
boolean val = wordsMap.keySet().containsAll(wordsMap1.keySet());
for (Entry<String, Integer> str : wordsMap.entrySet()) {
System.out.println("================= " + str.getKey());
if(wordsMap1.containsKey(str.getKey())){
System.out.println("Map2 Contains Map 1 Key");
}
}
System.out.println("================= " + val);
有誰有沒有其他的建議來實現這個目標?謝謝
編輯 我怎麼能計算每個單獨值的出現次數?
爲什麼你自己的代碼不工作? – ifly6
哇!這是關於構建我所見過的字頻地圖的最糟糕實施。完整掃描列表以獲取唯一字詞,然後對每個唯一字詞*進行完整掃描。哎呀!由於您使用的Java 8流,請嘗試使用'流()收集(Collectors.groupingBy(W - > W,Collectors.counting()))。'。 – Andreas
我把重點放在了最後一次檢查以爲OP是問如何改善它,我完全忽略了第一部分。我同意安德烈亞斯的觀點,第一部分應該完全重構。 – user6904265