2013-04-25 105 views
0

我想從lucene索引搜索,但我想篩選此搜索。有兩個字段的內容和類別。假設我想搜索具有「體育」的文件,我也想統計在a和b類別中有多少文件。我正試圖用以下代碼實現這一點。但問題是,如果有數百萬條記錄,然後由於循環執行而變得緩慢,建議我以另一種方式來完成任務。從apache lucene索引搜索並計算結果組明智

嘗試{文件indexDir =新的文件(「文件路徑」)

  Directory directory = FSDirectory.open(indexDir); 

       IndexSearcher searcher = new IndexSearcher(directory, true); 
       int maxhits=1000000; 
       QueryParser parser1 = new QueryParser(Version.LUCENE_36, "contents", 

        new StandardAnalyzer(Version.LUCENE_36)); 

      Query qu=parser1.parse("sport"); 

       TopDocs topDocs = searcher.search(, maxhits); 
       ScoreDoc[] hits = topDocs.scoreDocs; 


      len = hits.length; 

     JOptionPane.showMessageDialog(null,"found times"+len); 

       int docId = 0; 
       Document d; 





String category=""; 

int ctr=0,ctr1=0; 

for (i = 0; i<len; i++) { 
docId = hits[i].doc; 
d = searcher.doc(docId); 
category= d.get(("category")); 
if(category.equals("a")) 
ctr++; 
if(category.equals("b")) 
ctr1++; 


} 

    JOptionPane.showMessageDialog("wprd found in category a times"+ctr); 
    JOptionPane.showMessageDialog("wprd found in category b times"+ctr1); 
    } 

catch(Exception ex) 

{ 

    ex.printStackTrace(); 
} 

回答

1

你可以只查詢你正在尋找每一個類別,並得到totalHits。更好的辦法是使用TotalHitCountCollector,而不是獲得TopDocs實例:

Query query = parser1.parser("+sport +category:a") 
TotalHitCountCollector collector = new TotalHitCountCollector(); 
search.search(query, collector); 
ctr = collector.getTotalHits(); 
query = parser1.parser("+sport +category:b") 
collector = new TotalHitCountCollector(); 
search.search(query, collector); 
ctr1 = collector.getTotalHits();