如何比較字符串數組並計算相似的字

我一直在試圖獲得代碼，但我仍然不能。這段代碼是我能做到的最接近的。我錯過了什麼？我試圖做這個沒有散列的代碼。如何比較字符串數組並計算相似的字

// Read all the words from the dictionary (text.txt) into an array 
    BufferedReader br = new BufferedReader(new FileReader("text.txt")); 
    int bufferLength = 1000000; 
    char[] buffer = new char[bufferLength]; 
    int charsRead = br.read(buffer, 0, bufferLength); 
    br.close(); 
    String text = new String(buffer); 
    text = text.trim(); 
    text = text.toLowerCase(); 
    String[] words = text.split("\n"); 

    System.out.println("Total number of words in text: " + words.length); 

    //Find unique words: 
    String[] uniqueText = words; 
    int[] uniqueTextCount = new int[uniqueText.length]; 

    for (int i = 0; i < words.length; i++) { 
     for (int j = 0; j < uniqueText.length; j++) { 
      if (words[i].equals(uniqueText[j])) { 
       uniqueTextCount[j]++; 
      } else { 
       uniqueText[i] = words[i]; 
      } 
     } 
     System.out.println(uniqueText[i] + " for " + uniqueTextCount[i]); 
    } 
}

來源

2015-11-26 Han

很抱歉的格式！我新來stackoverflow和編程 – Han

請複製並粘貼代碼在這裏，而不是張貼圖像。使用戶可以更輕鬆地自行嘗試代碼。 – Emz

String [] uniqueText = words; int [] uniqueTextCount = new int [uniqueText.length]; （int j = 0; j Han

從原始的代碼我假設：

text.txt包含在每行一個字。
你想統計每個單詞出現的次數（而不是你在標題中寫的「相似單詞」）。

也許第一件事就是BufferedReader允許line-by-line reading：

for (String line; (line = br.nextLine()) != null;) { 
    // Process each line, which in this case is a word. 
}

它是由線更理想的工藝生產線，而不是讀取整個文件，因爲你的程序需要使用更多的內存（如就像文件的大小一樣），在這種情況下你可以少用一些。

現在，如果我們考慮需求，期望的輸出是從不同單詞到它們的計數的映射。這應該在上面的for -loop之前。

// A HashMap would also work, but you have specified that you do not want 
// to use hashing. 
Map<String, Integer> distinctWordCounts = new TreeMap<>();

而在循環從而初始化時，在每次迭代中（即，對於我們遇到的每一行），我們可以執行以下操作：

if (distinctWordCounts.hasKey(line)) { 
    // We have seen this word. Increment the count we've seen it. 
    distinctWordCounts.put(line, distinctWordCounts.get(line) + 1); 
} else { 
    // We have never seen this word. Set the count seen to 1. 
    distinctWordCounts.put(line, 1); 
}

上面的代碼招致比稍微更多的開銷似乎是最優的，因爲if案件涉及三次遍歷，我們可以通過一次遍歷。但這可能是另一天的故事，除非你有理由關心非漸近速度的改善。

在一天結束的時候，我們可以的話

for (Entry<String, Integer> entry : distinctWordCounts.entrySet()) { 
    System.out.println(entry.getKey() + " occurs " + entry.getValue() + "times."); 
}

來源

2015-11-26 23:28:24

這種散列方法的作品！但我試圖在沒有地圖方法的情況下實現這一點 – Han

這聽起來像你只是想計算不同出現的每個單詞的數量的計數穿越distinctWordCounts？如果是這樣的話，你可以做這樣的事情：

String[] array = {"a", "a", "b", "c", "c", "c", "d", "e", "f", "f"}; 
Map<String, Long> map = new HashMap<>(); 

Stream.of(array) 
     .distinct() 
     .forEach(s -> map.put(s, 
      Stream.of(array) 
       .filter(s::equals) 
       .count()));

如果你只是想唯一字：

String[] unique = Stream.of(array) 
         .distinct() 
         .toArray(String[]::new);

來源

2015-11-26 23:33:15 iamjoshlee

如何比較字符串數組並計算相似的字

回答

相關問題