使用apache lucene取消停用詞時的異常

我使用以下代碼從輸入文本中刪除停用詞。當tokenStream.incrementToken()運行時，我得到異常。使用apache lucene取消停用詞時的異常

java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.

代碼：

public static String removeStopWords(String textFile) throws Exception { 
     CharArraySet stopWords = EnglishAnalyzer.getDefaultStopSet(); 
     TokenStream tokenStream = new StandardTokenizer(); 
     tokenStream = new StopFilter(tokenStream, stopWords); 
     StringBuilder sb = new StringBuilder(); 
     CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class); 
     tokenStream.reset(); 
     while (tokenStream.incrementToken()) { 
      String term = charTermAttribute.toString(); 
      sb.append(term + " "); 
     } 
     return sb.toString(); 
    }

來源

2017-08-20 Rizstien

實例化的TokenStream如下 -

TokenStream tokenStream = new StandardAnalyzer().tokenStream("field",new StringReader(textFile));

來源

2017-08-21 21:05:57 darcula

在這段代碼是什麼「場」？ – Rizstien

「field」是創建的TokenStream用於的字段（IndexableField）的名稱。如果您的tokenStream不是特定於某個字段，則可以傳遞null。另外，由於你的輸入是一個字符串，你可以使用''tokenStream（null，textFile）;' – darcula

使用apache lucene取消停用詞時的異常

回答

相關問題