2014-01-18 44 views
0

我想查找文本中單詞的出現次數。 我有這樣RavenDb:在文本中搜索出現緩慢

public class Page 
{ 
    public string Id { get; set; } 
    public string BookId { get; set; } 
    public string Content { get; set; } 
    public int PageNumber { get; set; } 
} 

一類我有我的指標是這樣的:

class Pages_SearchOccurrence : AbstractIndexCreationTask<Page, Pages_SearchOccurrence.ReduceResult> 
{ 
    public class ReduceResult 
    { 
     public string PageId { get; set; } 
     public int Count { get; set; } 
     public string Word { get; set; } 
     public string Content { get; set; } 
    } 

    public Pages_SearchOccurrence() 
    { 
     Map = pages => from page in pages 
         let words = page.Content 
             .ToLower() 
             .Split(new string[] { " ", "\n", ",", ";" }, StringSplitOptions.RemoveEmptyEntries) 
         from w in words 
         select new 
         { 
          page.Content, 
          PageId = page.Id, 
          Count = 1, 
          Word = w 
         }; 


     Reduce = results => from result in results 
          group result by new { PageId = result.PageId, result.Word } into g 
          select new 
          { 
           Content = g.First().Content, 
           PageId = g.Key.PageId, 
           Word = g.Key.Word, 
           Count = g.ToList().Count() 
          }; 

     Index(x => x.Content, Raven.Abstractions.Indexing.FieldIndexing.Analyzed); 
    } 
} 

最後,我的查詢是這樣的:

using (var session = documentStore.OpenSession()) 
      { 
       RavenQueryStatistics stats; 
       var occurence = session.Query<Pages_SearchOccurrence.ReduceResult, Pages_SearchOccurrence>() 
        .Statistics(out stats) 
        .Where(x => x.Word == "works") 
        .ToList(); 


      } 

但我意識到,RavenDb很慢(或我的查詢不好)012) stats.IsStale = true和烏鴉工作室花費太多時間,只給出幾個結果。 我有1000個文檔「Pages」,每頁1000個字的內容。 爲什麼我的查詢不好,我如何才能找到頁面中的事件? 謝謝你的幫助!

+0

你爲什麼不靠Lucene來做這件事?它具有您所知的全文索引和查詢功能。我錯過了什麼嗎? –

+0

你可能會覺得這有幫助:http://stackoverflow.com/questions/16774036/search-inside-an-attachment-in-ravendb – NoChance

回答

0

你做錯了。您應該將Content字段設置爲Analyzed並使用RavenDB的Search()運算符。緩慢的原因很可能是因爲您的索引代碼所做的未優化工作的數量。

0

我發現了部分結果。

也許我不清楚:我的目標是在頁面中查找單詞的出現次數。 我搜索頁面中單詞的點擊次數,我想按此計數排序。

我改變了我的指標是這樣的:

class Pages_SearchOccurrence : AbstractIndexCreationTask<Page, Pages_SearchOccurrence.ReduceResult>{ 

public class ReduceResult 
    { 
     public string Content { get; set; } 
     public string PageId { get; set; } 
     public string Count { get; set; } 
     public string Word { get; set; } 
    } 

    public Pages_SearchOccurrence() 
    { 
     Map = pages => from page in pages 
         let words = page.Content.ToLower().Split(new string[] { " ", "\n", ",", ";" }, StringSplitOptions.RemoveEmptyEntries) 
         from w in words 
         select new 
         { 
          page.Content, 
          PageId = page.Id, 
          Count = 1, 
          Word = w 
         }; 

     Index(x => x.Content, Raven.Abstractions.Indexing.FieldIndexing.Analyzed); 
     Index(x => x.PageId, Raven.Abstractions.Indexing.FieldIndexing.NotAnalyzed); 
    } 

最後,我的新的查詢看起來是這樣的:

using (var session = documentStore.OpenSession()) 
      { 

       var query = session.Query<Pages_SearchOccurrence.ReduceResult, Pages_SearchOccurrence>() 
        .Search((x) => x.Word, "works") 
        .AggregateBy(x => x.PageId) 
        .CountOn(x => x.Count) 
        .ToList() 
        .Results 
        .FirstOrDefault(); 

       var listFacetValues = query.Value.Values; 
       var finalResult = listFacetValues.GroupBy(x => x.Hits).OrderByDescending(x => x.Key).Take(5).ToList(); 


      } 

finalResult給了我一組Facetvalue其中有一個屬性Hits

(the prop ERTIES 打我FacetValue的計數的位置相同)

命中財產給我,我想,但對我來說這段代碼是不正確的ravendb工作室不喜歡這樣的結果太。

你有更好的解決方案嗎?