我試圖將我的搜索功能轉換爲允許涉及多個單詞的模糊搜索。我現有的搜索代碼如下所示:爲什麼此Lucene.Net查詢失敗?
// Split the search into seperate queries per word, and combine them into one major query
var finalQuery = new BooleanQuery();
string[] terms = searchString.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
foreach (string term in terms)
{
// Setup the fields to search
string[] searchfields = new string[]
{
// Various strings denoting the document fields available
};
var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, searchfields, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
finalQuery.Add(parser.Parse(term), BooleanClause.Occur.MUST);
}
// Perform the search
var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
var searcher = new IndexSearcher(directory, true);
var hits = searcher.Search(finalQuery, MAX_RESULTS);
這正常工作,如果我有一個的「我的名字叫安德魯」的名稱字段的實體,我執行了「安德魯名稱」進行搜索,Lucene的正確認定正確的文件。現在我想啓用模糊搜索,以便正確找到「Anderw Name」。我改變了我的方法,使用下面的代碼:
const int MAX_RESULTS = 10000;
const float MIN_SIMILARITY = 0.5f;
const int PREFIX_LENGTH = 3;
if (string.IsNullOrWhiteSpace(searchString))
throw new ArgumentException("Provided search string is empty");
// Split the search into seperate queries per word, and combine them into one major query
var finalQuery = new BooleanQuery();
string[] terms = searchString.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
foreach (string term in terms)
{
// Setup the fields to search
string[] searchfields = new string[]
{
// Strings denoting document field names here
};
// Create a subquery where the term must match at least one of the fields
var subquery = new BooleanQuery();
foreach (string field in searchfields)
{
var queryTerm = new Term(field, term);
var fuzzyQuery = new FuzzyQuery(queryTerm, MIN_SIMILARITY, PREFIX_LENGTH);
subquery.Add(fuzzyQuery, BooleanClause.Occur.SHOULD);
}
// Add the subquery to the final query, but make at least one subquery match must be found
finalQuery.Add(subquery, BooleanClause.Occur.MUST);
}
// Perform the search
var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
var searcher = new IndexSearcher(directory, true);
var hits = searcher.Search(finalQuery, MAX_RESULTS);
不幸的是,有了這個代碼,如果我提交了搜索查詢「安德魯名稱」(同前),我得到任何結果回來。
其核心思想是所有術語必須在至少一個文檔字段中找到,但每個術語可以駐留在不同的字段中。有沒有人有任何想法,爲什麼我的重寫查詢失敗?
最後編輯:好的事實證明我是在通過大量的複雜化這一點,也沒有必要從我的第一種方法改變。回覆到第一代碼片段後,我啓用的模糊搜索,通過改變
finalQuery.Add(parser.Parse(term), BooleanClause.Occur.MUST);
到
finalQuery.Add(parser.Parse(term.Replace("~", "") + "~"), BooleanClause.Occur.MUST);
MIN_SIMILARITY和PREFIX_LENGTH的值是什麼? finalQuery.ToString()的價值是什麼? – sisve 2011-05-26 03:35:20
我在我的文章中添加了常用的常量。 – KallDrexx 2011-05-26 03:39:16
索引時使用什麼分析器? – sisve 2011-05-26 03:48:06