2013-04-25 65 views
2

我正在學習Lucene,這是我的第一個測試類。我試圖在內存中執行搜索,並從示例中借用了一些代碼。但搜索無法返回任何匹配。你能幫我嗎?謝謝。我的Lucene搜索沒有返回結果

package my.test; 
    import java.io.IOException; 

    import org.apache.lucene.analysis.standard.StandardAnalyzer; 
    import org.apache.lucene.analysis.util.CharArraySet; 
    import org.apache.lucene.document.Document; 
    import org.apache.lucene.document.Field; 
    import org.apache.lucene.document.StringField; 
    import org.apache.lucene.index.IndexWriter; 
    import org.apache.lucene.index.IndexWriterConfig; 
    import org.apache.lucene.index.IndexWriterConfig.OpenMode; 
    import org.apache.lucene.index.Term; 
    import org.apache.lucene.search.BooleanClause; 
    import org.apache.lucene.search.BooleanQuery; 
    import org.apache.lucene.search.IndexSearcher; 
    import org.apache.lucene.search.PrefixQuery; 
    import org.apache.lucene.search.ScoreDoc; 
    import org.apache.lucene.search.SearcherManager; 
    import org.apache.lucene.store.RAMDirectory; 
    import org.apache.lucene.util.Version; 

    public class TestInMemorySearch { 
     public static void main(String[] args) { 
     // Construct a RAMDirectory to hold the in-memory representation of the index. 
     RAMDirectory idx = new RAMDirectory(); 

    try { 
     // Make an writer to create the index 
     IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_42, new StandardAnalyzer(Version.LUCENE_42, CharArraySet.EMPTY_SET)); 
     iwc.setOpenMode(OpenMode.CREATE_OR_APPEND); 
     IndexWriter writer = new IndexWriter(idx, iwc); 

     // Add some Document objects containing quotes 
     writer.addDocument(createDocument("Theodore Roosevelt man", "It behooves every man to remember that the work of the " 
      + "critic, is of altogether secondary importance, and that, " + "in the end, progress is accomplished by the man who does " + "things.")); 
     writer.addDocument(createDocument("Friedrich Hayek", "The case for individual freedom rests largely on the " 
      + "recognition of the inevitable and universal ignorance " + "of all of us concerning a great many of the factors on " 
      + "which the achievements of our ends and welfare depend.")); 
     writer.addDocument(createDocument("Ayn Rand", "There is nothing to take a man's freedom away from " 
      + "him, save other men. To be free, a man must be free " + "of his brothers.")); 
     writer.addDocument(createDocument("Mohandas Gandhi", "Freedom is not worth having if it does not connote " + "freedom to err.")); 

     // Optimize and close the writer to finish building the index 
     writer.close(); 
     // Build an IndexSearcher using the in-memory index 
     SearcherManager mgr = new SearcherManager(idx, null); 

     try { 
     Document[] hits = search(mgr, "man", 100); 
     for (Document doc : hits) { 
      String title = doc.get("title"); 
      String content = doc.get("content"); 
      System.out.println("Found match:[Title]" + title + ", [Content]" + content); 
     } 

     } catch (IOException e) { 
     e.printStackTrace(); 
     } 

    } catch (IOException ioe) { 
     // In this example we aren't really doing an I/O, so this 
     // exception should never actually be thrown. 
     ioe.printStackTrace(); 
    } 
    } 

    /** 
    * Make a Document object with an un-indexed title field and an indexed 
    * content field. 
    */ 
    private static Document createDocument(String title, String content) { 
    Document doc = new Document(); 
    doc.add(new StringField("title", title, Field.Store.YES)); 
    doc.add(new StringField("content", content, Field.Store.YES)); 

    return doc; 
    } 

    private static Document[] search(SearcherManager searchManager, String searchString, int maxResults) throws IOException { 
    IndexSearcher searcher = null; 
    try { 
     // Build the query. 
     String[] tokens = searchString.split("\\s+"); 
     BooleanQuery query = new BooleanQuery(); 
     for (String token : tokens) { 
     query.add(new PrefixQuery(new Term("title", token)), BooleanClause.Occur.MUST); 
     query.add(new PrefixQuery(new Term("content", token)), BooleanClause.Occur.MUST); 
     } 

     searcher = searchManager.acquire(); 
     ScoreDoc[] scoreDocs = searcher.search(query, maxResults).scoreDocs; 
     Document[] documents = new Document[scoreDocs.length]; 
     for (int i = 0; i < scoreDocs.length; i++) { 
     documents[i] = searcher.doc(scoreDocs[i].doc); 
     } 
     return documents; 
    } finally { 
     if (searcher != null) { 
     searchManager.release(searcher); 
     } 
    } 
    }  
} 
+0

你應該降低代碼以儘可能簡單的東西,然後建立。使用單個字段,使用簡單的'IndexSearcher'而不是'SearcherManagar'。形成一個簡單的'TermQuery'而不是'PrefixQuery'的'BooleanQuery'。 – 2013-04-25 14:30:17

+0

謝謝,我完成了這些簡單的測試。只是想知道爲什麼這個不起作用。 – Ran 2013-04-25 14:50:03

+0

如果你想要這個答案,那麼你應該指出最複雜的工作,它仍然可以工作,並且確切的變化是停止這樣做。 – 2013-04-25 14:56:07

回答

7

StringField似乎是顯而易見的選擇,但它不是你想在這裏用什麼。你想要TextFieldStringField將該字段表示爲單個令牌,實質上是關鍵字或標識符。 TextField分析並標記字段以進行全文搜索。

修復它是不斷變化的,在你的search方法簡單:

doc.add(new StringField("title", title, Field.Store.YES)); 
doc.add(new StringField("content", content, Field.Store.YES)); 

doc.add(new TextField("title", title, Field.Store.YES)); 
doc.add(new TextField("content", content, Field.Store.YES)); 
+0

謝謝,它現在有效。我正在嘗試使用一切,所以這一切都搞砸了。 – Ran 2013-04-25 17:55:24

+0

還有一個問題,我可以通過'Document [] hits = search(mgr,「theodore」,100)來搜索;''現在,但我無法通過'Document [] hits = search(mgr,「Theodore」,100)''進行搜索。與上面的代碼相同。你知道它爲什麼區分大小寫嗎?我的意思是我認爲它應該是相反的,我應該能夠搜索'T',而不是't',對嗎? – Ran 2013-04-25 19:16:02

+0

在索引中,您的條款正在通過「WhitespaceAnalyzer」(其他轉換),因此索引條款都是小寫。當使用QueryParser時,通常會應用相同的轉換,因此搜索將不區分大小寫。但是,手動構建TermQueries時,應首先確保並小寫輸入。 – femtoRgon 2013-04-25 19:24:55