2013-04-30 110 views
1

我想索引使用Lucene 4.2的一組文檔。我創建了一個自定義的分析,不記號化和不小寫的術語,用下面的代碼:索引字段的Lucene 4.2分析器

 public class NoTokenAnalyzer extends Analyzer{ 
public Version matchVersion; 
public NoTokenAnalyzer(Version matchVersion){ 
    this.matchVersion=matchVersion; 
} 
@Override 
protected TokenStreamComponents createComponents(String fieldName, Reader reader) { 
    // TODO Auto-generated method stub 
    //final Tokenizer source = new NoTokenTokenizer(matchVersion, reader); 
    final KeywordTokenizer source=new KeywordTokenizer(reader); 
    TokenStream result = new LowerCaseFilter(matchVersion, source); 
    return new TokenStreamComponents(source, result); 

} 

}

我使用分析器來構建指數(靈感來自了Lucene文檔中提供代碼):

public static void IndexFile(Analyzer analyzer) throws IOException{ 
    boolean create=true; 



String directoryPath="path"; 
File folderToIndex=new File(directoryPath); 
File[]filesToIndex=folderToIndex.listFiles(); 

Directory directory=FSDirectory.open(new File("index path")); 

IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_42, analyzer); 

     if (create) { 
     // Create a new index in the directory, removing any 
     // previously indexed documents: 
     iwc.setOpenMode(OpenMode.CREATE); 
    } else { 
     // Add new documents to an existing index: 
     iwc.setOpenMode(OpenMode.CREATE_OR_APPEND); 
     } 

     IndexWriter writer = new IndexWriter(directory, iwc); 
for (final File singleFile : filesToIndex) { 


//process files in the directory and extract strings to index 
    //.......... 
    String field1; 
    String field2; 

    //index fields 

     Document doc=new Document(); 


    Field f1Field= new Field("f1", field1, TextField.TYPE_STORED); 


     doc.add(f1Field); 
     doc.add(new Field("f2", field2, TextField.TYPE_STORED)); 
     } 
writer.close(); 
    } 

與代碼的問題是,索引字段沒有被標記化的,但是它們也未小寫的,即,似乎在索引期間未施加分析儀。 我不明白什麼是錯的?我如何使分析儀工作?

回答

1

該代碼正常工作。因此,它可能會幫助某人在Lucene 4.2中創建自定義分析器,並將其用於索引和搜索。