2015-12-11 140 views
0

我向我的Lucene管道中添加了一個自定義屬性,如here(在「添加自定義屬性」部分中所述)。從DirectoryReader訪問自定義Lucene屬性

現在,我建立我的索引(通過添加所有文檔通過IndexWriter)後,我想能夠在閱讀索引目錄時評估此屬性。我該怎麼做呢?

我現在正在做的是以下幾點:

DirectoryReader reader = DirectoryReader.open(index); 
TermsEnum iterator = null; 
for (int i = 0; i < r.maxDoc(); i++) { 
    Terms terms = r.getTermVector(i, "content"); 
    iterator = terms.iterator(iterator); 
    AttributeSource attributes = iterator.attributes(); 
    SentenceAttribute sentence = attributes.addAttribute(SentenceAttribute.class); 

    while (true) { 
     BytesRef term = iterator.next(); 
     if (term == null) { 
      break; 
     } 

     System.out.println(term.utf8ToString()); 
     System.out.println(sentence.getStringSentenceId()); 
    } 
} 

似乎不工作:我得到相同sentenceId所有的時間。

我使用Lucene 4.9.1。

+0

這可能是相關的:http://stackoverflow.com/questions/24041456/how-to-store-custom-token-attribute-in-lucene-index –

回答

0

最後,我解決了它。要做到這一點,我用PayloadAttribute來存儲我需要的數據。對於每個令牌在實證分析階段

fieldType.setStoreTermVectors(true); 
fieldType.setStoreTermVectorOffsets(true); 
fieldType.setStoreTermVectorPositions(true); 
fieldType.setStoreTermVectorPayloads(true); 

然後,設置所述有效載荷屬性:

要存儲的有效載荷在索引中,首先,設置FieldstoreTermVectorPayloads屬性以及其他一些東西

private final PayloadAttribute payloadAtt = addAttribute(PayloadAttribute.class); 

// in incrementToken() 
payloadAtt.setPayload(new BytesRef(String.valueOf(myAttr))); 

然後建立索引,最後,在此之後有可能得到有效載荷是這樣的:

DocsAndPositionsEnum payloads = null; 
TermsEnum iterator = null; 

Terms termVector = reader.getTermVector(docId, "field"); 
iterator = termVector.iterator(iterator); 

while ((ref = iterator.next()) != null) { 
    payloads = iterator.docsAndPositions(null, payloads, DocsAndPositionsEnum.FLAG_PAYLOADS); 

    while (payloads.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) { 
     int freq = payloads.freq(); 
     for (int i = 0; i < freq; i++) { 
      payloads.nextPosition(); 

      BytesRef payload = payloads.getPayload(); 
      // do something with the payload 
     } 
    } 
}