使用Apache Lucene進行搜索

我一直在試圖實現Lucene來使我的網站上的搜索更快。使用Apache Lucene進行搜索

我的代碼目前有效，但是，我認爲我沒有正確使用Lucene。現在，我的搜索查詢是productName:asterisk(input)asterisk - 我無法想象這是您應該如何查找productName包含input的所有產品。我認爲這與我將字段保存到文檔的方式有關。

我的代碼：

LuceneHelper.cs

using System; 
using System.Collections; 
using System.Collections.Generic; 
using System.Data.Entity.Migrations.Model; 
using System.Linq; 
using System.Threading.Tasks; 
using Lucene.Net; 
using Lucene.Net.Analysis; 
using Lucene.Net.Analysis.Standard; 
using Lucene.Net.Documents; 
using Lucene.Net.Index; 
using Lucene.Net.QueryParsers; 
using Lucene.Net.Search; 
using Lucene.Net.Store; 
using Rentpro.Models; 
using RentPro.Models.Tables; 
using RentProModels.Models; 

namespace RentPro.Helpers 
{ 
    public class LuceneHelper 
    { 
     private const Lucene.Net.Util.Version Version = Lucene.Net.Util.Version.LUCENE_30; 
     private bool IndicesInitialized; 
     private List<Language> Languages = new List<Language>(); 

     public void BuildIndices(DB db) 
     { 
      Languages = GetLanguages(db); 
      Analyzer analyzer = new StandardAnalyzer(Version); 
      List<Product> allProducts = db.GetAllProducts(true, false); 
      foreach (Language l in Languages) 
      { 
       BuildIndicesForLanguage(allProducts, analyzer, l.ID); 
      } 
      IndicesInitialized = true; 
     } 

     private void BuildIndicesForLanguage(List<Product> products, Analyzer analyzer, int id = 0) 
     { 
      using (
       IndexWriter indexWriter = new IndexWriter(GetDirectory(id), analyzer, 
        IndexWriter.MaxFieldLength.UNLIMITED)) 
      { 
       var x = products.Count; 
       foreach (Product p in products) 
       { 
        SearchProduct product = SearchProduct.FromProduct(p, id); 
        Document document = new Document(); 
        Field productIdField = new Field("productId", product.ID.ToString(), Field.Store.YES, Field.Index.NO); 
        Field productTitleField = new Field("productName", product.Name, Field.Store.YES, Field.Index.ANALYZED); 
        Field productDescriptionField = new Field("productDescription", product.Description, Field.Store.YES, Field.Index.ANALYZED); 
        Field productCategoryField = new Field("productCategory", product.Category, Field.Store.YES, Field.Index.ANALYZED); 
        Field productCategorySynonymField = new Field("productCategorySynonym", product.CategorySynonym, Field.Store.YES, Field.Index.ANALYZED); 
        Field productImageUrlField = new Field("productImageUrl", product.ImageUrl, Field.Store.YES, Field.Index.NO); 
        Field productTypeField = new Field("productType", product.Type, Field.Store.YES, Field.Index.NO); 
        Field productDescriptionShortField = new Field("productDescriptionShort", product.DescriptionShort, Field.Store.YES, Field.Index.NO); 
        Field productPriceField = new Field("productPrice", product.Price, Field.Store.YES, Field.Index.NO); 
        document.Add(productIdField); 
        document.Add(productTitleField); 
        document.Add(productDescriptionField); 
        document.Add(productCategoryField); 
        document.Add(productCategorySynonymField); 
        document.Add(productImageUrlField); 
        document.Add(productTypeField); 
        document.Add(productDescriptionShortField); 
        document.Add(productPriceField); 
        indexWriter.AddDocument(document); 
       } 
       indexWriter.Optimize(); 
       indexWriter.Commit(); 
      } 

     } 

     public List<SearchProduct> Search(string input) 
     { 
      if (!IndicesInitialized) 
      { 
       BuildIndices(new DB()); 
       return Search(input); 

      } 
      IndexReader reader = IndexReader.Open(GetCurrentDirectory(), true); 
      Searcher searcher = new IndexSearcher(reader); 
      Analyzer analyzer = new StandardAnalyzer(Version); 
      TopScoreDocCollector collector = TopScoreDocCollector.Create(100, true); 
      MultiFieldQueryParser parser = new MultiFieldQueryParser(Version, 
       new[] { "productDescription", "productCategory", "productCategorySynonym", "productName" }, analyzer) 
      { 
       AllowLeadingWildcard = true 
      }; 

      searcher.Search(parser.Parse("*" + input + "*"), collector); 

      ScoreDoc[] hits = collector.TopDocs().ScoreDocs; 

      List<int> productIds = new List<int>(); 
      List<SearchProduct> results = new List<SearchProduct>(); 

      foreach (ScoreDoc scoreDoc in hits) 
      { 
       Document document = searcher.Doc(scoreDoc.Doc); 
       int productId = int.Parse(document.Get("productId")); 
       if (!productIds.Contains(productId)) 
       { 
        productIds.Add(productId); 
        SearchProduct result = new SearchProduct 
        { 
         ID = productId, 
         Description = document.Get("productDescription"), 
         Name = document.Get("productName"), 
         Category = document.Get("productCategory"), 
         CategorySynonym = document.Get("productCategorySynonym"), 
         ImageUrl = document.Get("productImageUrl"), 
         Type = document.Get("productType"), 
         DescriptionShort = document.Get("productDescriptionShort"), 
         Price = document.Get("productPrice") 
        }; 
        results.Add(result); 
       } 
      } 
      reader.Dispose(); 
      searcher.Dispose(); 
      analyzer.Dispose(); 
      return results; 
     } 

     private string GetDirectoryPath(int languageId = 1) 
     { 
      return GetDirectoryPath(Languages.SingleOrDefault(x => x.ID == languageId).UriPart); 
     } 

     private string GetDirectoryPath(string languageUri) 
     { 
      return AppDomain.CurrentDomain.BaseDirectory + @"\App_Data\LuceneIndices\" + languageUri; 
     } 

     private List<Language> GetLanguages(DB db) 
     { 
      return db.Languages.ToList(); 
     } 

     private int GetCurrentLanguageId() 
     { 
      return Translator.GetCurrentLanguageID(); 
     } 

     private FSDirectory GetCurrentDirectory() 
     { 
      return FSDirectory.Open(GetDirectoryPath(GetCurrentLanguageId())); 
     } 

     private FSDirectory GetDirectory(int languageId) 
     { 
      return FSDirectory.Open(GetDirectoryPath(languageId)); 
     } 
    } 


    public class SearchProduct 
    { 
     public int ID { get; set; } 
     public string Description { get; set; } 
     public string Name { get; set; } 
     public string ImageUrl { get; set; } 
     public string Type { get; set; } 
     public string DescriptionShort { get; set; } 
     public string Price { get; set; } 
     public string Category { get; set; } 
     public string CategorySynonym { get; set; } 

     public static SearchProduct FromProduct(Product p, int languageId) 
     { 
      return new SearchProduct() 
      { 
       ID = p.ID, 
       Description = p.GetText(languageId, ProductLanguageType.Description), 
       Name = p.GetText(languageId), 
       ImageUrl = 
        p.Images.Count > 0 
         ? "/Company/" + Settings.Get("FolderName") + "/Pictures/Products/100x100/" + 
          p.Images.Single(x => x.Type == "Main").Url 
         : "", 
       Type = p is HuurProduct ? "HuurProduct" : "KoopProduct", 
       DescriptionShort = p.GetText(languageId, ProductLanguageType.DescriptionShort), 
       Price = p is HuurProduct ? ((HuurProduct)p).CalculatedPrice(1, !Settings.GetBool("BTWExLeading")).ToString("0.00") : "", 
       Category = p.Category.Name, 
       CategorySynonym = p.Category.Synonym 
      }; 

     } 

    } 
}

如何調用LuceneHelper：

 public ActionResult Lucene(string SearchString, string SearchOrderBy, int? page, int? amount) 
     { 
      List<SearchProduct> searchResults = new List<SearchProduct>(); 
      if (!SearchString.IsNullOrWhiteSpace()) 
      { 
       LuceneHelper lucene = new LuceneHelper(); 
       searchResults = lucene.Search(SearchString); 
      } 
      return View(new LuceneSearchResultsVM(db, SearchString, searchResults, SearchOrderBy, page ?? 1, amount ?? 10)); 
     }

LuceneSearchResultsVM：

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Linq.Dynamic; 
using System.Web; 
using RentPro.Models.Tables; 
using System.Linq.Expressions; 
using System.Reflection; 
using Microsoft.Ajax.Utilities; 
using Rentpro.Models; 
using RentPro.Helpers; 
using RentProModels.Models; 

namespace RentPro.ViewModels 
{ 
    public class LuceneSearchResultsVM 
    { 
     public List<SearchProduct> SearchProducts { get; set; } 
     public bool BTWActive { get; set; } 
     public bool BTWEXInput { get; set; } 
     public bool BTWShow { get; set; } 
     public bool BTWExLeading { get; set; } 
     public string FolderName { get; set; } 
     public string CurrentSearchString { get; set; } 
     public string SearchOrderBy { get; set; } 
     public int Page; 
     public int Amount; 
     public String SearchQueryString { 
      get 
      { 
       return Translator.Translate("Zoekresultaten voor") + ": " + CurrentSearchString + " (" + 
         SearchProducts.Count + " " + Translator.Translate("resultaten") + " - " + 
         Translator.Translate("pagina") + " " + Page + " " + Translator.Translate("van") + " " + 
         CalculateAmountOfPages() + ")"; 
      } 
      set { } 
     } 

     public LuceneSearchResultsVM(DB db, string queryString, List<SearchProduct> results, string searchOrderBy, int page, int amt) 
     { 
      BTWActive = Settings.GetBool("BTWActive"); 
      BTWEXInput = Settings.GetBool("BTWEXInput"); 
      BTWShow = Settings.GetBool("BTWShow"); 
      BTWExLeading = Settings.GetBool("BTWExLeading"); 
      FolderName = Settings.Get("FolderName"); 
      SearchProducts = results; 
      CurrentSearchString = queryString; 
      if (searchOrderBy.IsNullOrWhiteSpace()) 
      { 
       searchOrderBy = "Name"; 
      } 
      SearchOrderBy = searchOrderBy; 
      Amount = amt == 0 ? 10 : amt; 
      int maxPages = CalculateAmountOfPages(); 
      Page = page > maxPages ? maxPages : page; 
      SearchLog.MakeEntry(queryString, SearchProducts.Count(), db, HttpContext.Current.Request.UserHostAddress); 
     } 


     public List<SearchProduct> GetOrderedList() 
     { 
      List<SearchProduct> copySearchProductList = new List<SearchProduct>(SearchProducts); 
      copySearchProductList = copySearchProductList.Skip((Page - 1) * Amount).Take(Amount).ToList(); 
      switch (SearchOrderBy) 
      { 
       case "Price": 
        copySearchProductList.Sort(new PriceSorter()); 
        break; 
       case "DateCreated": 
        return copySearchProductList; //TODO 
       default: 
        return copySearchProductList.OrderBy(n => n.Name).ToList(); 
      } 
      return copySearchProductList; 
     } 

     public int CalculateAmountOfPages() 
     { 
      int items = SearchProducts.Count; 
      return items/Amount + (items % Amount > 0 ? 1 : 0); 
     } 


    } 

    public class PriceSorter : IComparer<SearchProduct> 
    { 
     public int Compare(SearchProduct x, SearchProduct y) 
     { 
      if (x == null || x.Price == "") return 1; 
      if (y == null || y.Price == "") return -1; 
      decimal priceX = decimal.Parse(x.Price); 
      decimal priceY = decimal.Parse(y.Price); 
      return priceX > priceY ? 1 : priceX == priceY ? 0 : -1; 
     } 
    } 

}

任何幫助將不勝感激。的產品

示例輸入列表：

查詢： SELECT Product.ID, Product.Decription, Product.Name FROM Product

期望的結果：

SQL Server查詢當量： SELECT Product.ID, Product.Decription, Product.Name FROM Product WHERE Product.Name LIKE '%Zelf%' OR Product.Decription LIKE '%Zelf%'

基本上，Zelf是輸入。我想查找包含輸入字符串的產品描述或產品名稱的所有匹配項。

來源

2015-11-20 nbokmans

沒有必要追加字段名稱來查詢文本：'parser.Parse（「產品名稱：*」 + input +「*」）'，因爲你已經在這裏完成了：'新的QueryParser（版本，「productName」，分析器）' –

是的，但是如果我不手動設置解析器的查詢爲'productName：* input * '它只返回完全匹配，不包含包含輸入字符串的匹配。 – nbokmans

'* input *'如何？ - 沒有'productName：'？ –

ucene not allows要用？或*作爲搜索詞的起始符號。爲了克服這個問題，你需要在你的索引中存儲從任何位置到結束位置的子字符串。例如。爲字測試你應該把指數

test 
est 
st 
t

我建議使用單獨的字段。例如，如果你有一個簡短的字段，一個單詞就像產品名稱一樣。

for(int i = 0; i < product.SafeName.length()-1; i++){ 
    Field productTitleSearchField = new Field("productNameSearch", product.SafeName.substring(i, product.SafeName.length()), Field.Store.NO, Field.Index.ANALYZED); 
}

在此之後，你可以使用下面的查詢字符串 productNameSearch:(input)asterisk或使用PrefixQuery搜索包含input產品名稱。

如果你有幾個單詞在你的領域，並且你將有足夠的輸入一些合理的長度，那麼最好爲該字段添加一個NGramTokenFilter。如果你的輸入字符串從n到m有限制，你應該創建一個NGram令牌過濾器，用n minGram和m maxGramm。如果你有話test和你限制2到3，你會在你的指數的話

te 
tes 
es 
est 
st

這之後，您可以通過串

ngrammField:(input)

來源

2015-11-23 08:33:32

似乎作者使用'Lucene.Net'，並在3.0.3版本中允許（https://lucenenet.apache.org/docs/3.0.3/de/d1b/class_lucene_1_1_net_1_1_search_1_1_wildcard_query.html）引用通配符，但它可以工作緩慢。 –

答案中的方法應該具有良好的性能，但會導致更大的索引大小。 –

你好，非常感謝你的回覆。雖然我不打算懷疑你的回答，但我不認爲我應該爲我想搜索的單詞的每個字母添加索引。我的搜索功能應該通過索引，並返回所有包含'input'字符串的產品。目前，我正在搜索產品的標題，類別名稱，類別名稱同義詞和產品說明。描述可以超過250個詞以上。 – nbokmans

搜索這不回答你的問題，但它在C＃中使用using塊更安全。在你當前的代碼中，可以調用dispose。

你這樣做：

IndexReader reader = IndexReader.Open(GetCurrentDirectory(), true); 
Searcher searcher = new IndexSearcher(reader); 
Analyzer analyzer = new StandardAnalyzer(Version); 

//... 

reader.Dispose(); 
searcher.Dispose(); 
analyzer.Dispose();

哪些可以替換爲：

using (IndexReader reader = IndexReader.Open(GetCurrentDirectory(), true)) 
using (Searcher searcher = new IndexSearcher(reader)) 
using (Analyzer analyzer = new StandardAnalyzer(Version)) 
{ 
    //Do whatever else here.. No need to call "dispose". 
}

上面是一個很值得try -> finally聲明它試圖做無論是在using語句。如果引發任何事情，finally塊會處理打開/分配的資源。

另一種方式（逗號操作。如果所有的變量是同一類型的）是：

using (whatever foo = new whatever(), whatever bar = new whatever(), ...) 
{ 
    //do whatever here.. 
}

來源

2015-12-02 03:24:19 Brandon

使用Apache Lucene進行搜索

回答

相關問題