2017-08-20 50 views
0

我使用以下代碼從輸入文本中刪除停用詞。當tokenStream.incrementToken()運行時,我得到異常。使用apache lucene取消停用詞時的異常

java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow. 

代碼:

public static String removeStopWords(String textFile) throws Exception { 
     CharArraySet stopWords = EnglishAnalyzer.getDefaultStopSet(); 
     TokenStream tokenStream = new StandardTokenizer(); 
     tokenStream = new StopFilter(tokenStream, stopWords); 
     StringBuilder sb = new StringBuilder(); 
     CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class); 
     tokenStream.reset(); 
     while (tokenStream.incrementToken()) { 
      String term = charTermAttribute.toString(); 
      sb.append(term + " "); 
     } 
     return sb.toString(); 
    } 

回答

1

實例化的TokenStream如下 -

TokenStream tokenStream = new StandardAnalyzer().tokenStream("field",new StringReader(textFile)); 
+0

在這段代碼是什麼 「場」? – Rizstien

+0

「field」是創建的TokenStream用於的字段(IndexableField)的名稱。如果您的tokenStream不是特定於某個字段,則可以傳遞null。另外,由於你的輸入是一個字符串,你可以使用''tokenStream(null,textFile);' – darcula