2
我在Lucene中編制索引,並且只關心從Lucene獲取相關文檔的ID(即不是字段值或任何突出顯示的信息)。鑑於這些要求,我應該使用哪個術語矢量,而不會影響搜索性能(速度)或質量(結果)?我也將使用MoreLikeThis所以不想在Lucene中使用哪個術語矢量選項?
TermVector.YES—Records the unique terms that occurred, and their counts, in each document, but doesn’t store any positions or offsets information
TermVector.WITH_POSITIONS—Records the unique terms and their counts, and also the positions of each occurrence of every term, but no offsets
TermVector.WITH_OFFSETS—Records the unique terms and their counts, with the offsets (start and end character position) of each occurrence of every term, but no positions
TermVector.WITH_POSITIONS_OFFSETS—Stores unique terms and their counts, along with positions and offsets
謝謝。
您想要內部lucene文檔編號或您在其中存儲的某個ID嗎? –