如何從txt文件中計算單詞的頻率 - Java

-3

我需要一些關於此代碼的幫助。我希望我的程序能夠根據所描述的模式計算匹配的每個單詞的頻率。如何從txt文件中計算單詞的頻率 - Java

public class Project { 
    public static void main(String[] args) throws FileNotFoundException{ 
    Scanner INPUT_TEXT = new Scanner(new File("moviereview.txt")).useDelimiter(" "); 

    String pattern = "[a-zA-Z'-]+"; 
    Pattern r = Pattern.compile(pattern); 

    int occurences=0; 

    while(INPUT_TEXT.hasNext()){ 
     //read next word 
     String Stringcandidate=INPUT_TEXT.next(); 

     //see if pattern matches (boolean find) 
     if(r.matcher(Stringcandidate).find()) { 
      occurences++; //increment occurences if pattern is found 
      String moviereview = m.group(0); //retrieve found string 
      String moviereview2 = moviereview.toLowerCase(); // ??? 

      System.out.println(moviereview2 + " appears " + occurences); 
      if(occurences>1){ 
       System.out.println(" times\n"); 
      } 
      else{ 
       System.out.println(" time\n"); 
      } 
     } 
     INPUT_TEXT.close();//Close your Scanner.  
    } 

}

來源

2016-11-19 Naz Muh

你能更具體嗎？現在發生了什麼？我們不在這裏爲您運行您的代碼。而且我們沒有你的文本文件 –

我不能幫你。當你甚至無法正確格式化（縮進）代碼以顯示代碼結構時，我拒絕查看代碼。 – Andreas

歡迎來到StackOverflow。如果您按照幫助中心提供的指導方針，最有可能獲得有用的答案。例如，像這樣：「尋求調試幫助的問題（」爲什麼這個代碼不工作？「）必須包含所需的行爲，特定的問題或錯誤以及在問題本身中重現問題所需的最短代碼。沒有明確問題陳述的問題對其他讀者沒有用處。「 –

正如我在之前的評論中所述，可以使用Map（HashMap）來存儲匹配的單詞及其出現頻率。

我建議將程序的功能封裝到較小的方法/類中，以便每個方法/類只執行一項小任務。所以代碼可以更好地讀取。

我假定你的文件將包含字符串「自動布什勝過她的番茄在矮牽牛汽車」

下面是代碼：

package how_to_calculate_the_frequency; 

import java.io.File; 
import java.io.FileNotFoundException; 
import java.util.HashMap; 
import java.util.Scanner; 
import java.util.regex.Matcher; 
import java.util.regex.Pattern; 

public class Project { 

    HashMap<String, Integer> map = new HashMap<String, Integer>(); 

    public static void main(String[] args){ 

     Project project = new Project(); 

     Scanner INPUT_TEXT = project.readFile(); 

     project.analyse(INPUT_TEXT); 

     project.showResults(); 

    } 

    /** 
    * logic to count the occurences of words matched by REGEX in a scanner that 
    * loaded some text 
    * 
    * @param scanner 
    *   the scanner holding the text 
    */ 
    public void analyse(Scanner scanner) { 

     String pattern = "[a-zA-Z'-]+"; 
     Pattern r = Pattern.compile(pattern); 

     while (scanner.hasNext()) { 
      // read next word 
      String Stringcandidate = scanner.next(); 

      // see if pattern matches (boolean find) 
      Matcher matcher = r.matcher(Stringcandidate); 
      if (matcher.find()) { 
       String matchedWord = matcher.group(); 
       //System.out.println(matchedWord); //check what is matched 
       this.addWord(matchedWord); 

      } 

     } 
     scanner.close();// Close your Scanner. 
    } 

    /** 
    * adds a word to the <word,count> Map if the word is new, a new entry is 
    * created, otherwise the count of this word is incremented 
    */ 
    public void addWord(String matchedWord) { 

     if (map.containsKey(matchedWord)) { 
      // increment occurrence 
      int occurrence = map.get(matchedWord); 
      occurrence++; 
      map.put(matchedWord, occurrence); 
     } else { 
      // add word and set occurrence to 1 
      map.put(matchedWord, 1); 
     } 

    } 

    /** 
    * reads a file from disk and returns a scanner to analyse it 
    * 
    * @return the file from disk as scanner 
    */ 
    public Scanner readFile() { 

     Scanner scanner = null; 

     /* use that for reading a file from disk 
     * try { scanner = new Scanner(new 
     * File("moviereview.txt")).useDelimiter(" "); } catch (Exception e) { 
     * e.printStackTrace(); } 
     */ 

     scanner = new Scanner("auto bush trumped her tomato in the petunia auto"); 

     return scanner; 
    } 

    /** 
    * prints the matched words and their occurrences 
    * in a readable way 
    */ 
    public void showResults() { 

     for (HashMap.Entry<String, Integer> matchedWord : map.entrySet()) { 
      int occurrence = matchedWord.getValue(); 
      System.out.print("\"" + matchedWord.getKey() + "\" appears " + occurrence); 
      if (occurrence > 1) { 
       System.out.print(" times\n"); 
      } else { 
       System.out.print(" time\n"); 
      } 
     } 

     // or as the new Java 8 lambda expression 
     // map.forEach((word,occurrence)->System.out.println("\"" + word + "\" 
     // appears " + occurrence + " times")); 
    } 
} 

// DONE seperate reading a file, analysing the file and 
// word-frequency-counting-logic in different 
// methods 
// Done implement <word,count> Map and logic to add new and known(to the map) 
// words

這產生了：

「的」出現1時間

「自動」出現2次

「她」 AP梨1時間

「在」出現1次

「襯套」出現1次

「捏造」出現1次

「番茄」出現1次

「矮牽牛」出現1次

關於

來源

2016-11-21 04:46:33

如何從txt文件中計算單詞的頻率 - Java

回答

相關問題