我有很多文本消息,我在它們的代碼行下面運行。Lucene項目致命錯誤
//記號化長期
TokenStream tokenStream = new ClassicTokenizer(LUCENE_VERSION, new StringReader(term));
// stemmize
tokenStream = new PorterStemFilter(tokenStream);
有時我得到下面的錯誤,有時沒有:
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000025f8360, pid=1688, tid=7492
#
# JRE version: 7.0-b147
# Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# J org.apache.lucene.analysis.PorterStemmer.stem(I)Z
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
我應該怎麼辦?
您是否嘗試過使用其中一個分析器,如EnglishAnalyzer - http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/en/EnglishAnalyzer.html,它會干擾並標記化它適合你嗎? – nbz
我在上面的代碼之前有這樣的一行:tokenStream = new StopFilter(LUCENE_VERSION,tokenStream,EnglishAnalyzer.getDefaultStopSet()); 但是當我打印這些條款時,他們不會被幹擾!所以我使用上面的代碼來進行stemmizing。 – user3582044