查找Lucene中每個實體的最後一個事件

因此，我有一個存儲在Lucene文檔存儲（6.2.1版）中的事件（文檔）。每份文件都有一個EntityId和一個Timestamp。查找Lucene中每個實體的最後一個事件

可以有許多具有相同EntityId的文檔。

我想檢索每個EntityId具有最新Timestamp的文檔。

是否必須提取每個事件並在Java中執行此操作？我看了一眼小面，但據我可以看到它只是計數，而不是最大/最小型聚合

來源

2016-11-11 Cheetah

您嘗試這樣做可以與可用GroupingSearch從神器lucene-grouping做什麼。

的GroupingSearch意志集團由集團提供的字段中輸入您的文檔（EntityId在我們的例子），它一定要在搜索排序的，否則你將得到下一個類型的錯誤：

java.lang.IllegalStateException ：字段'$ {field-name}'（expected = SORTED）時意外的文檔類型爲NONE。

然後才能夠有最新的文件對於一個給定EntityId，你還需要有排序字段Timestamp。

因此，舉例來說，如果我索引作爲下一個文件：

String id = .. 
long timestamp = ... 
Document doc = new Document(); 
// The sorted version of my EntityId 
doc.add(new SortedDocValuesField("EntityId", new BytesRef(id))); 
// The stored version of my EntityId to be able to get its value later if needed 
doc.add(new StringField("Id", id, Field.Store.YES)); 
// The sorted version of my timestamp 
doc.add(new NumericDocValuesField("Timestamp", timestamp)); 
// The stored version of my timestamp to be able to get its value later if needed 
doc.add(new StringField("Tsp", Long.toString(timestamp), Field.Store.YES));

那麼我將能夠獲得最新的文件對於給定EntityId爲未來：約

IndexSearcher searcher = ... 
// Some random query here I get all docs 
Query query = new MatchAllDocsQuery(); 
// Group the docs by EntityId 
GroupingSearch groupingSearch = new GroupingSearch("EntityId"); 
// Sort the docs of the same group by Timestamp in reversed order to get 
// the most recent first 
groupingSearch.setSortWithinGroup(
    new Sort(new SortField("Timestamp", SortField.Type.LONG, true)) 
); 
// Set the limit of docs for a given group to 1 as we only want the latest 
// NB: This is the default value so it is not required 
groupingSearch.setGroupDocsLimit(1); 
// Get the 10 first matching groups 
TopGroups<BytesRef> result = groupingSearch.search(searcher, query, 0, 10); 
// Iterate over the groups found 
for (GroupDocs<BytesRef> groupDocs : result.groups) { 
    // Iterate over the docs of a given group 
    for (ScoreDoc scoreDoc : groupDocs.scoreDocs) { 
     // Get the related doc 
     Document doc = searcher.doc(scoreDoc.doc); 
     // Print the stored value of EntityId and Timestamp 
     System.out.printf(
      "EntityId = %s Timestamp = %s%n", doc.get("Id"), doc.get("Tsp") 
     ); 
    } 
}

更多細節grouping。

來源

2017-02-11 09:42:07

啊！ - 我認爲閱讀文檔時錯過的關鍵信息是'SortedDocValuesField'位。我需要重新編制索引，但不適當的時候會給它一個回報，並在迴應時標記迴應。謝謝！ – Cheetah

是的，我很確定它是我想要的。我目前正試圖弄清楚如何使用'getAllMatchingGroups'，因爲我正在尋找將每個實體進行分組，但是我不知道如何處理返回的BytesRef集合：s – Cheetah

你可以嘗試使用Collapsing query parser這樣的（未測試）：

fq={!collapse field=EntityId max=Timestamp}

或者你很可能達到同樣與Grouping

來源

2016-11-15 22:23:01 Persimmonium

我使用的是Lucene，而不是Solr，除非我誤以爲這些是Solr特定的。 – Cheetah

ooops，對不起，我只是看着另一個Solr問題，我並沒有意識到這是簡單的Lucene。 – Persimmonium

查找Lucene中每個實體的最後一個事件

回答

相關問題