如何指定Lucene.net布爾邏輯AND，OR，而不是來自正常和/或不是變量的運算符？

在我的項目中，我使用Lucence實現了全文索引搜索。但是，在做這件事時，我堅持用邏輯來區分Lucene布爾運算符與Normal和/或不是單詞。如何指定Lucene.net布爾邏輯AND，OR，而不是來自正常和/或不是變量的運算符？

假設例如，如果我們正在搜索「我想要一支筆和鉛筆」，但默認情況下Lucene.net搜索Lucene OR操作。所以它會搜索像「我或想要一個OR筆或鉛筆」不喜歡我想有什麼想「我或想要一個或筆或OR和或鉛筆」。那麼，我們如何區分一個正常的，或不是來自Lucene運營商？

爲此，我已經做了，它看起來像

/// <summary> 
    /// Method to get search predicates 
    /// </summary> 
    /// <param name="searchTerm">Search term</param> 
    /// <returns>List of predicates</returns> 
    public static IList<string> GetPredicates(string searchTerm) 
    { 
     //// Remove unwanted characters 
     //searchTerm = Regex.Replace(searchTerm, "[<(.|\n)*?!'`>]", string.Empty); 
     string exactSearchTerm = string.Empty, 
       keywordOrSearchTerm = string.Empty, 
       andSearchTerm = string.Empty, 
       notSearchTerm = string.Empty, 
       searchTermWithOutKeywords = string.Empty; 
     //// Exact search tern 
     exactSearchTerm = "\"" + searchTerm.Trim() + "\""; 
     //// Search term without keywords 
     searchTermWithOutKeywords = Regex.Replace(
      searchTerm, " and not | and | or ", " ", RegexOptions.IgnoreCase); 
     //// Splioted keywords 
     string[] splittedKeywords = searchTermWithOutKeywords.Trim().Split(
      new char[] { ' ', ',' }, StringSplitOptions.RemoveEmptyEntries); 
     //// Or search term 
     keywordOrSearchTerm = string.Join(" OR ", splittedKeywords); 
     //// And search term 
     andSearchTerm = string.Join(" AND ", splittedKeywords); 
     //// not search term 
     int index = 0; 
     List<string> searchTerms = (from term in Regex.Split(
             searchTerm, " and not ", RegexOptions.IgnoreCase) 
             where index++ != 0 
             select term).ToList(); 
     searchTerms = (from term in searchTerms 
       select Regex.IsMatch(term, " and | or ", RegexOptions.IgnoreCase) ? 
       Regex.Split(term, " and | or ", RegexOptions.IgnoreCase).FirstOrDefault() : 
       term).ToList(); 
     notSearchTerm = searchTerms.Count > 0 ? string.Join(" , ", searchTerms) : "\"\""; 
     return new List<string> { exactSearchTerm, andSearchTerm, keywordOrSearchTerm, notSearchTerm }; 
    }

一個輔助方法，但它會返回四個結果。所以我必須通過我的索引循環4次，但它似乎是非常忙碌的。那麼任何人都可以在一個循環中解決這個問題嗎？

來源

2011-07-14 Febin J S

內置的StandardAnalyzer將爲您排除常見單詞，有關說明，請參閱this article。

來源

2011-07-14 10:51:40

好建議。 +1 –

像@Matt沃倫建議，lucene有所謂的「停用詞」，通常對搜索質量沒有多大價值，但使索引變得龐大而臃腫。像「a，and，或，an」這樣的StopWords通常會在您的文本編入索引時自動過濾出來，然後在解析時將其從查詢中濾除。 StopFilter在這兩種情況下都可以應對此行爲，但您可以選擇不使用StopFilter的分析器。

另一個問題是查詢解析。如果我沒有記錯，lucene查詢解析器只會將大寫字母ORAND和NOT作爲關鍵字，所以如果用戶輸入全部大寫字母，則需要用小寫字母替換，以免將其視爲操作符。這裏有一些Regex.Replace代碼爲：

string queryString = "the red pencil and blue pencil are both not green or brown"; 
queryString = 
    Regex.Replace (
     queryString, 
     @"\b(?:OR|AND|NOT)\b", 
     m => m.Value.ToLowerInvariant());

來源

2011-07-14 13:09:24

如何指定Lucene.net布爾邏輯AND，OR，而不是來自正常和/或不是變量的運算符？

回答

相關問題