0
似乎Apache Lucene api正在從每個版本進行更改。我如何從Apache lucene 6.4.0的IndexReader獲得最頻繁的術語。如何從使用Apache的索引讀取器獲取最頻繁的術語lucene 6.4.0
只見Get highest frequency terms from Lucene index這是不是與Apache Lucene的6.4.0
似乎Apache Lucene api正在從每個版本進行更改。我如何從Apache lucene 6.4.0的IndexReader獲得最頻繁的術語。如何從使用Apache的索引讀取器獲取最頻繁的術語lucene 6.4.0
只見Get highest frequency terms from Lucene index這是不是與Apache Lucene的6.4.0
這將爲Lucene的6.4工作的代碼非常有用。它找到了所有領域中最頻繁的術語,用於分別查找字段中最頻繁的術語調整代碼。
IndexReader reader = DirectoryReader.open(dir);
final Fields fields = MultiFields.getFields(reader);
final Iterator<String> iterator = fields.iterator();
long maxFreq = Long.MIN_VALUE;
String freqTerm = "";
while(iterator.hasNext()) {
final String field = iterator.next();
final Terms terms = MultiFields.getTerms(reader, field);
final TermsEnum it = terms.iterator();
BytesRef term = it.next();
while (term != null) {
final long freq = it.totalTermFreq();
if (freq > maxFreq) {
maxFreq = freq;
freqTerm = term.utf8ToString();
}
term = it.next();
}
}
System.out.println(freqTerm + " " + maxFreq);
謝謝@Mysterion,這正是我想要的。 –
所有領域的頂級術語? – Mysterion
是的,@Mysterio。 –