2012-03-15 24 views
2

通過使用以下算法,我無法獲得或直接使用單詞本身的偏移量。任何幫助,將不勝感激Lucene - simpleAnalyzer - 如何獲取匹配的單詞?

... 
    Analyzer analyzer = new SimpleAnalyzer(); 
    MemoryIndex index = new MemoryIndex(); 

    QueryParser parser = new QueryParser(Version.LUCENE_30, "content", analyzer); 

    float score = index.search(parser.parse("+content:" + target)); 

    if(score > 0.0f) 
     System.out.println("How to know matched word?"); 

回答

2

這裏是整個內存索引和搜索示例。我剛剛寫下了我的自我,它完美的作品。我知道你需要在內存中存儲索引,但問題是你爲什麼需要MemoryIndex?您只需使用RAMDirectory,而您的索引將被存儲在內存中,因此當您執行搜索時,索引將從RAMDirectory(內存)加載。

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34); 
    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34, analyzer); 
    RAMDirectory directory = new RAMDirectory(); 
    try { 
     IndexWriter indexWriter = new IndexWriter(directory, config); 
     Document doc = new Document(); 
     doc.add(new Field("content", text, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_OFFSETS)); 
     indexWriter.addDocument(doc); 
     indexWriter.optimize(); 
     indexWriter.close(); 

     QueryParser parser = new QueryParser(Version.LUCENE_34, "content", analyzer); 
     IndexSearcher searcher = new IndexSearcher(directory, true); 
     IndexReader reader = IndexReader.open(directory, true); 

     Query query = parser.parse(word); 
     TopScoreDocCollector collector = TopScoreDocCollector.create(10000, true); 
     searcher.search(query, collector); 
     ScoreDoc[] hits = collector.topDocs().scoreDocs; 
     if (hits != null && hits.length > 0) { 
      for (ScoreDoc hit : hits) { 
       int docId = hit.doc; 
       Document hitDoc = searcher.doc(docId); 

       TermFreqVector termFreqVector = reader.getTermFreqVector(docId, "content"); 
       TermPositionVector termPositionVector = (TermPositionVector) termFreqVector; 
       int termIndex = termFreqVector.indexOf(word); 
       TermVectorOffsetInfo[] termVectorOffsetInfos = termPositionVector.getOffsets(termIndex); 

       for (TermVectorOffsetInfo termVectorOffsetInfo : termVectorOffsetInfos) { 
        concordances.add(processor.processConcordance(hitDoc.get("content"), word, termVectorOffsetInfo.getStartOffset(), size)); 
       } 
      } 
     } 

     analyzer.close(); 
     searcher.close(); 
     directory.close(); 
+0

嗨,感謝您的評論。你可以將你的示例轉換爲memoryIndex用法嗎?這就是爲什麼我使用memoryIndex進行全文搜索的原因,我不能使用像您的代碼中那樣的命中或文檔。 – Javatar 2012-03-21 12:09:51

+0

我編輯了我的答案,看看。 – 2012-03-28 19:01:46

+0

嗨,謝謝,我使用memoryIndex是因爲性能和內存問題我已經瞭解到MemoryIndex比RAMDirectory更高效更方便,這就是爲什麼我更喜歡選擇MemoryIndex。 – Javatar 2012-03-29 08:44:49