2012-05-15 32 views
4

我有一個應用程序來跟蹤網站的頁面訪問。 這裏是我的模型:Raven DB:這個多圖/縮小指數有什麼問題?

public class VisitSession { 
    public string SessionId { get; set; } 
    public DateTime StartTime { get; set; } 
    public string UniqueVisitorId { get; set; } 
    public IList<PageVisit> PageVisits { get; set; } 
} 

當訪問者去的網站,訪問會話開始。一次訪問會話有很多頁面訪問。當訪問者第一次訪問網站時,跟蹤器將寫入UniqueVisitorId(GUID)Cookie。所以我們能夠知道訪問者是否正在返回訪問者。

現在我想寫一個視圖,顯示每天的TotalVisitSessions,TotalPageVisits,TotalUniqueVisitors。所以,我寫這篇文章多圖減少:

public class VisitSummaryByDateIndex : AbstractMultiMapIndexCreationTask<VisitSummaryByDate> 
{ 
    public VisitSummaryByDateIndex() 
    { 
     AddMap<VisitSession>(sessions => from s in sessions 
              select new VisitSummaryByDate 
              { 
               Date = s.StartTime.Date, 
               TotalVisitSessions = 1, 
               TotalPageVisits = 0, 
               TotalNewVisitors = s.IsNewVisit ? 1 : 0, 
               TotalUniqueVisitors = 0, 
               UniqueVisitorId = s.UniqueVisitorId 
              }); 

     AddMap<PageVisit>(visits => from v in visits 
            select new VisitSummaryByDate 
            { 
             Date = v.VisitTime.Date, 
             TotalVisitSessions = 0, 
             TotalPageVisits = 1, 
             TotalNewVisitors = 0, 
             TotalUniqueVisitors = 0, 
             UniqueVisitorId = String.Empty 
            }); 

     Reduce = results => from result in results 
          group result by result.Date into g 
          select new VisitSummaryByDate 
          { 
           Date = g.Key, 
           TotalVisitSessions = g.Sum(it => it.TotalVisitSessions), 
           TotalPageVisits = g.Sum(it => it.TotalPageVisits), 
           TotalNewVisitors = g.Sum(it => it.TotalNewVisitors), 
           TotalUniqueVisitors = g.Select(it => it.UniqueVisitorId).Where(it => it.Length > 0).Distinct().Count(), 
           UniqueVisitorId = String.Empty 
          }; 
    } 
} 

的問題是,在「TotalUniqueVisitors」計算,有時指數結果的TotalUniqueVisitors是1,有時是2。但我查了資料,它永遠不會像這樣少。我的Map/Reduce語法有什麼問題嗎?

相關崗位:Raven DB: How to create "UniqueVisitorCount by date" index

代碼樣本數據可以在這裏找到:https://gist.github.com/2702071

回答

2

Reduce實際上是多次處理結果。 您的索引假定這隻發生一次,並且可以訪問整個結果集。

您的索引需要看起來像這樣:

public class VisitSummaryByDateIndex : AbstractMultiMapIndexCreationTask<VisitSummaryByDate> 
{ 
    public VisitSummaryByDateIndex() 
    { 
     AddMap<VisitSession>(sessions => from s in sessions 
             select new VisitSummaryByDate 
             { 
              Date = s.StartTime.Date, 
              TotalVisitSessions = 1, 
              TotalPageVisits = 0, 
              TotalNewVisitors = s.IsNewVisit ? 1 : 0, 
              TotalUniqueVisitors = 1, 
              UniqueVisitorId = new[]{s.UniqueVisitorId} 
             }); 

     AddMap<PageVisit>(visits => from v in visits 
            select new VisitSummaryByDate 
            { 
             Date = v.VisitTime.Date, 
             TotalVisitSessions = 0, 
             TotalPageVisits = 1, 
             TotalNewVisitors = 0, 
             TotalUniqueVisitors = 0, 
             UniqueVisitorId = new string[0] 
            }); 

     Reduce = results => from result in results 
          group result by result.Date into g 
          select new VisitSummaryByDate 
          { 
           Date = g.Key, 
           TotalVisitSessions = g.Sum(it => it.TotalVisitSessions), 
           TotalPageVisits = g.Sum(it => it.TotalPageVisits), 
           TotalNewVisitors = g.Sum(it => it.TotalNewVisitors), 
           TotalUniqueVisitors = g.Sum(it => it.TotalUniqueVisitors),, 
           UniqueVisitorId = g.Select(x=>x.UniqueVisitorId).Distinct() 
          }; 
    } 
} 
+1

(不敢相信我在質疑你!),但這是行不通的。每個會話不一定是一個新的唯一ID,所以總和是不正確的。另外,我假設UniqueVisitorId現在應該是'IEnumerable ',所以不能編譯。然而,看這個問題的帖子是基於(http://stackoverflow.com/questions/10597359/raven-db-how-to-create-uniquevisitorcount-by-date-index)我不認爲該字段是無論如何重要,因此我的答案只是將其設置爲FirstOrDefault。 – Simon

+1

您應該從這裏刪除'TotalUniqueVisitors',並使用'UniqueVisitorId.Count'來獲取唯一訪客的實際數量。 – configurator

+0

@Simon同意。但我認爲將其設置爲FirstOrDefault並不正確。因爲reduce函數將被處理多次(可能會將其本身的輸出作爲輸入)。所以如果你使用FirstOrDefault,你有時會得到1 TotalUniqueVisits的結果。根據艾恩德的回答,我想我已經制定出最終的解決方案。但我正在考慮現在的表現。因爲reduce函數中的SelectMany使得生成的文檔非常大。你們怎麼想?這是新的要點:https://gist.github.com/2702071 –

2

正確的指標是:

public class VisitSummaryByDateIndex : AbstractMultiMapIndexCreationTask<VisitSummaryByDate> 
{ 
    public VisitSummaryByDateIndex() 
    { 
     AddMap<VisitSession>(sessions => from s in sessions 
             select new VisitSummaryByDate 
             { 
              Date = s.StartTime.Date, 
              TotalVisitSessions = 1, 
              TotalPageVisits = 0, 
              TotalNewVisitors = s.IsNewVisit ? 1 : 0, 
              TotalUniqueVisitors = 0, 
              UniqueVisitorId = s.UniqueVisitorId 
             }); 

     AddMap<PageVisit>(visits => from v in visits 
            select new VisitSummaryByDate 
            { 
             Date = v.VisitTime.Date, 
             TotalVisitSessions = 0, 
             TotalPageVisits = 1, 
             TotalNewVisitors = 0, 
             TotalUniqueVisitors = 0, 
             UniqueVisitorId = string.Empty, 
            }); 

     Reduce = results => from result in results 
          group result by result.Date into g 
          select new VisitSummaryByDate 
          { 
           Date = g.Key, 
           TotalVisitSessions = g.Sum(it => it.TotalVisitSessions), 
           TotalPageVisits = g.Sum(it => it.TotalPageVisits), 
           TotalNewVisitors = g.Sum(it => it.TotalNewVisitors), 
           TotalUniqueVisitors = g.Select(it => it.UniqueVisitorId).Where(x => x.Length > 0).Distinct().Count(), 
           UniqueVisitorId = g.FirstOrDefault().UniqueVisitorId, 
          }; 
    } 
} 

不同的是,UniqueVisitorId在設置減少。我不能100%確定爲什麼這是必需的,我必須承認。

+0

啊,感謝Ayende的答案,我們現在知道爲什麼它是必需的。我不確定使用像Ayende這樣的數組有什麼好處,但它也可以根據我的答案使用字符串。 – Simon