索引字段的Lucene 4.2分析器

我想索引使用Lucene 4.2的一組文檔。我創建了一個自定義的分析，不記號化和不小寫的術語，用下面的代碼：索引字段的Lucene 4.2分析器

 public class NoTokenAnalyzer extends Analyzer{ 
public Version matchVersion; 
public NoTokenAnalyzer(Version matchVersion){ 
    this.matchVersion=matchVersion; 
} 
@Override 
protected TokenStreamComponents createComponents(String fieldName, Reader reader) { 
    // TODO Auto-generated method stub 
    //final Tokenizer source = new NoTokenTokenizer(matchVersion, reader); 
    final KeywordTokenizer source=new KeywordTokenizer(reader); 
    TokenStream result = new LowerCaseFilter(matchVersion, source); 
    return new TokenStreamComponents(source, result); 

}

}

我使用分析器來構建指數（靈感來自了Lucene文檔中提供代碼）：

public static void IndexFile(Analyzer analyzer) throws IOException{ 
    boolean create=true; 



String directoryPath="path"; 
File folderToIndex=new File(directoryPath); 
File[]filesToIndex=folderToIndex.listFiles(); 

Directory directory=FSDirectory.open(new File("index path")); 

IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_42, analyzer); 

     if (create) { 
     // Create a new index in the directory, removing any 
     // previously indexed documents: 
     iwc.setOpenMode(OpenMode.CREATE); 
    } else { 
     // Add new documents to an existing index: 
     iwc.setOpenMode(OpenMode.CREATE_OR_APPEND); 
     } 

     IndexWriter writer = new IndexWriter(directory, iwc); 
for (final File singleFile : filesToIndex) { 


//process files in the directory and extract strings to index 
    //.......... 
    String field1; 
    String field2; 

    //index fields 

     Document doc=new Document(); 


    Field f1Field= new Field("f1", field1, TextField.TYPE_STORED); 


     doc.add(f1Field); 
     doc.add(new Field("f2", field2, TextField.TYPE_STORED)); 
     } 
writer.close(); 
    }

與代碼的問題是，索引字段沒有被標記化的，但是它們也未小寫的，即，似乎在索引期間未施加分析儀。我不明白什麼是錯的？我如何使分析儀工作？

來源

2013-04-30 stckjp

該代碼正常工作。因此，它可能會幫助某人在Lucene 4.2中創建自定義分析器，並將其用於索引和搜索。

來源

2013-05-01 11:34:09 stckjp

索引字段的Lucene 4.2分析器

回答

相關問題