Java分隔符跳過一個詞

我正在讀取一個文本文件，並將該文本文件中的一組唯一字存儲到一個ArrayList中（請確認是否有更好的結構來完成此操作）。我使用掃描儀掃描文本文件並將分隔符指定爲「」（空格），如下所示;Java分隔符跳過一個詞

ArrayList <String> allWords = new ArrayList <String>(); 
    ArrayList <String> Vocabulary = new ArrayList <String>(); 
    int count = 0; 

    Scanner fileScanner = null; 
    try { 
     fileScanner = new Scanner (new File (textFile)); 

    } catch (FileNotFoundException e) { 
     System.out.println (e.getMessage()); 
     System.exit(1); 
    } 

    fileScanner.useDelimiter(" "); 

    while (fileScanner.hasNext()) { 

     allWords.add(fileScanner.next().toLowerCase()); 

     count++; 

     String distinctWord = (fileScanner.next().toLowerCase()); 
     System.out.println (distinctWord.toString()); 

     if (!allWords.contains(distinctWord)) { 

      Vocabulary.add(distinctWord); 

     } 
    }

因此，在打印詞彙表的內容後，每個單詞後都會跳過一個單詞。因此，例如，如果我有以下文本文件;

「敏捷的棕色狐狸跳過懶狗」

印製的內容是「過懶快速狐狸」，然後給我一個錯誤;

Exception in thread "main" java.util.NoSuchElementException 
    at java.util.Scanner.throwFor(Unknown Source) 
    at java.util.Scanner.next(Unknown Source) 
    at *java filename*.getWords(NaiveBayesTxtClass.java:82) 
    at *java filename*.main(NaiveBayesTxtClass.java:22)

任何人都可以請給我一些關於如何解決這個問題的建議嗎？我有一種感覺，它與fileScanner.useDelimiter和fileScanner.hasNext（）語句有關。

來源

2012-06-03 Triple777er

使用['HashSet']（http://docs.oracle.com/javase/6/docs/api/java/util/HashSet.html）而不是'ArrayList' - 它會自動忽略重複項。 –

Thankyou Greg，使用HashSet更容易，工作也更少。非常感激。 – Triple777er

在hasNext（）檢查一次後，你要調用Scanner＃next（）兩次，並且忽略next（）的返回值之一。

你在（1）處稱它並將它添加到allWords
並在（2）處再次調用並打印它。

while (fileScanner.hasNext()) { 

    allWords.add(fileScanner.next().toLowerCase()); // **** (1) 

    count++; 

    String distinctWord = (fileScanner.next().toLowerCase()); // **** (2) 
    System.out.println (distinctWord.toString()); 

    if (!allWords.contains(distinctWord)) { 

     Vocabulary.add(distinctWord); 

    } 
}

解決方案：調用掃描儀＃next（）的一次，保存字符串返回給一個變量，然後將變量添加到HashSet中，並打印變量。例如，

while (fileScanner.hasNext()) { 
    String word = fileScanner.next().toLowerCase(); 
    allWords.add(word); // **** (1) 
    count++; 
    // String distinctWord = (fileScanner.next().toLowerCase()); // **** (2) 
    System.out.println (word); 
    vocabularySet.add(word); // a HashSet 
}

安全的一般規則是，你應該有每個呼叫和一個一對一關係Scanner#hasNextXXX()Scanner#nextXXX()

來源

2012-06-03 00:47:51

謝謝非常先生，這解決了我的問題。 – Triple777er

@ Triple777er：不客氣！ –

正如你還問數據結構，你可以這樣做：

List<String> allWords = new ArrayList<String>(); 
    SortedSet<String> Vocabulary = new TreeSet<String>(); 
    int count = 0; 

    Scanner fileScanner = null; 
    try { 
     fileScanner = new Scanner(new File(textFile)); 

    } catch (FileNotFoundException e) { 
     System.out.println(e.getMessage()); 
     System.exit(1); 
    } 

    fileScanner.useDelimiter(" "); 

    while (fileScanner.hasNext()) { 
     String word = fileScanner.next().toLowerCase(); 
     allWords.add(word); 
     if (Vocabulary.add(word)) { 
      System.out.print("+ "); 
     } 
     System.out.println(word); 
    }

正如你所看到的變量的接口（列表，SortedSet的）聲明，並用具體的類實現。這不僅允許重新實現，而且對於函數參數特別有用。

來源

2012-06-03 01:08:33

Java分隔符跳過一個詞

回答

相關問題