有人可以幫助我找到所有lucene索引中的詞頻：
例如，如果文檔A有3個詞（B），文檔C有2個詞，我想要一個方法，以返回圖5是表示在所有Lucene索引詞（B）的頻率統計lucene索引中的詞頻

2010-11-12 Ehsan

你在看什麼樣的索引大小？取決於您可能想要使用Hadoop來做到這一點，或者使用簡單的索引解析器來收集地圖中的單詞頻率。 – anirvan 2010-11-12 18:23:06

這已被要求多次：

2010-11-12 19:47:40 Xodarap

假設你使用Lucene 3.x的工作：

IndexReader ir = IndexReader.open(dir); 
TermDocs termDocs = ir.termDocs(new Term("your_field", "your_word")); 
int count = 0; 
while (termDocs.next()) { 
    count += termDocs.freq(); 
}

一些評論：

dir是Lucene的Directory class的實例。 RAM和文件系統索引的創建方式不同，請參閱Lucene文檔以獲取詳細信息。

"your_filed"是提交搜索一個術語。如果您有多個字段，則可以爲所有這些字段運行過程，或者爲索引文件編制索引時，可以創建特殊字段（例如「_content」）並在其中保留所有其他字段的串聯值。

來源

2010-11-12 19:48:21 ffriend

非常'TermDocs'不在lucene 5.3.1中，我使用:( – 2016-11-24 19:02:00

使用Lucene 3.4

簡單的方法來計數，但你需要兩個數組： -/

int[] docs = new int[1000]; 
int[] freqs = new int[1000]; 
int count = indexReader.termDocs(term).read(docs, freqs);

要注意：如果你會用閱讀你是不是能夠接下來用（）因爲read（）之後你已經在枚舉的末尾：

int[] docs = new int[1000]; 
int[] freqs = new int[1000]; 
TermDocs td = indexReader.termDocs(term); 
int count = td.read(docs, freqs); 
while (td.next()){ // always false, already at the end of the enumartion 
}

來源

2013-07-17 11:12:27 Oliver

統計lucene索引中的詞頻

回答

使用Lucene 3.4

相關問題