2012-06-01 34 views
1

更新1,以下Ayende的回答地圖降低RavenDb,更新1

這是我第一次出遠門到RavenDb並試用一下我寫了一個小的map/reduce,但不幸的是,結果是空的?

我有160萬左右的文件加載到RavenDb

的文件:

public class Tick 
{ 
    public DateTime Time; 
    public decimal Ask; 
    public decimal Bid; 
    public double AskVolume; 
    public double BidVolume; 
} 

,並希望得到向隨時間的特定週期的最小值和最大值。

的時間收集被定義爲:

var ticks = session.Query<Tick>().Where(x => x.Time > new DateTime(2012, 4, 23) && x.Time < new DateTime(2012, 4, 24, 00, 0, 0)).ToList(); 

這給了我90280頁的文件,到目前爲止,一切順利。

但隨後的map/reduce:

Map = rows => from row in rows 
          select new 
          { 
           Max = row.Bid, 
           Min = row.Bid, 
           Time = row.Time, 
           Count = 1 
          }; 

Reduce = results => from result in results 
           group result by new{ result.MaxBid, result.Count} into g 
           select new 
           { 
            Max = g.Key.MaxBid, 
            Min = g.Min(x => x.MaxBid), 
            Time = g.Key.Time, 
            Count = g.Sum(x => x.Count) 

           }; 

...

private class TickAggregationResult 
{ 
    public decimal MaxBid { get; set; } 
     public decimal MinBid { get; set; } 
     public int Count { get; set; } 

    } 

然後我創建索引,並嘗試進行查詢:

Raven.Client.Indexes.IndexCreation.CreateIndexes(typeof(TickAggregation).Assembly, documentStore); 


     var session = documentStore.OpenSession(); 

     var g1 = session.Query<TickAggregationResult>(typeof(TickAggregation).Name); 


     var group = session.Query<Tick, TickAggregation>() 
         .Where(x => x.Time > new DateTime(2012, 4, 23) && 
            x.Time < new DateTime(2012, 4, 24, 00, 0, 0) 
           ) 
      .Customize(x => x.WaitForNonStaleResults()) 
              .AsProjection<TickAggregationResult>(); 

但該集團是隻是空洞:(

,你可以SE e我已經嘗試了兩種不同的查詢,我不確定有什麼不同,有人可以解釋嗎?

現在,我得到一個錯誤:enter image description here

集團仍爲空:(

讓我解釋一下我試圖在純SQL來完成:

select min(Ask), count(*) as TickCount from Ticks 
where Time between '2012-04-23' and '2012-04-24) 
+0

請顯示錯誤信息。 – usr

回答

3

不幸。,的Map/Reduce不工作的方式嘛,至少減少它的一部分不爲了減少你的設置,你就必須預先定義特定的時間範圍分組,例如通過 - 每日,每週,每月等呦如果你每天減少,你可以每天得到最小/最大/每天的數量。

有一種方式來獲得你想要的東西,但它有一些性能方面的考慮。基本上,你根本沒有減少,但你按時間索引,然後在轉換結果時進行聚合。這與如果您運行第一個查詢進行篩選然後在您的客戶端代碼中進行彙總的情況類似。唯一的好處是聚合在服務器端完成,因此您不必將所有數據傳輸到客戶端。

性能在這裏關注的是一個時間範圍有多大,你濾波,或者更準確地說,還會有多少項目是你的過濾範圍內?如果它相對較小,則可以使用此方法。如果它太大,而服務器經過結果集,你會等待。

這裏是示出這種技術的一個範例程序:

using System; 
using System.Linq; 
using Raven.Client.Document; 
using Raven.Client.Indexes; 
using Raven.Client.Linq; 

namespace ConsoleApplication1 
{ 
    public class Tick 
    { 
    public string Id { get; set; } 
    public DateTime Time { get; set; } 
    public decimal Bid { get; set; } 
    } 

    /// <summary> 
    /// This index is a true map/reduce, but its totals are for all time. 
    /// You can't filter it by time range. 
    /// </summary> 
    class Ticks_Aggregate : AbstractIndexCreationTask<Tick, Ticks_Aggregate.Result> 
    { 
    public class Result 
    { 
     public decimal Min { get; set; } 
     public decimal Max { get; set; } 
     public int Count { get; set; } 
    } 

    public Ticks_Aggregate() 
    { 
     Map = ticks => from tick in ticks 
       select new 
        { 
         Min = tick.Bid, 
         Max = tick.Bid, 
         Count = 1 
        }; 

     Reduce = results => from result in results 
       group result by 0 
        into g 
        select new 
         { 
          Min = g.Min(x => x.Min), 
          Max = g.Max(x => x.Max), 
          Count = g.Sum(x => x.Count) 
         }; 
    } 
    } 

    /// <summary> 
    /// This index can be filtered by time range, but it does not reduce anything 
    /// so it will not be performant if there are many items inside the filter. 
    /// </summary> 
    class Ticks_ByTime : AbstractIndexCreationTask<Tick> 
    { 
    public class Result 
    { 
     public decimal Min { get; set; } 
     public decimal Max { get; set; } 
     public int Count { get; set; } 
    } 

    public Ticks_ByTime() 
    { 
     Map = ticks => from tick in ticks 
       select new {tick.Time}; 

     TransformResults = (database, ticks) => 
       from tick in ticks 
       group tick by 0 
       into g 
       select new 
         { 
         Min = g.Min(x => x.Bid), 
         Max = g.Max(x => x.Bid), 
         Count = g.Count() 
         }; 
    } 
    } 

    class Program 
    { 
    private static void Main() 
    { 
     var documentStore = new DocumentStore { Url = "http://localhost:8080" }; 
     documentStore.Initialize(); 
     IndexCreation.CreateIndexes(typeof(Program).Assembly, documentStore); 


     var today = DateTime.Today; 
     var rnd = new Random(); 

     using (var session = documentStore.OpenSession()) 
     { 
     // Generate 100 random ticks 
     for (var i = 0; i < 100; i++) 
     { 
      var tick = new Tick { Time = today.AddMinutes(i), Bid = rnd.Next(100, 1000)/100m }; 
      session.Store(tick); 
     } 

     session.SaveChanges(); 
     } 


     using (var session = documentStore.OpenSession()) 
     { 
     // Query items with a filter. This will create a dynamic index. 
     var fromTime = today.AddMinutes(20); 
     var toTime = today.AddMinutes(80); 
     var ticks = session.Query<Tick>() 
      .Where(x => x.Time >= fromTime && x.Time <= toTime) 
      .OrderBy(x => x.Time); 

     // Ouput the results of the above query 
     foreach (var tick in ticks) 
      Console.WriteLine("{0} {1}", tick.Time, tick.Bid); 

     // Get the aggregates for all time 
     var total = session.Query<Tick, Ticks_Aggregate>() 
      .As<Ticks_Aggregate.Result>() 
      .Single(); 
     Console.WriteLine(); 
     Console.WriteLine("Totals"); 
     Console.WriteLine("Min: {0}", total.Min); 
     Console.WriteLine("Max: {0}", total.Max); 
     Console.WriteLine("Count: {0}", total.Count); 

     // Get the aggregates with a filter 
     var filtered = session.Query<Tick, Ticks_ByTime>() 
      .Where(x => x.Time >= fromTime && x.Time <= toTime) 
      .As<Ticks_ByTime.Result>() 
      .Take(1024) // max you can take at once 
      .ToList() // required! 
      .Single(); 
     Console.WriteLine(); 
     Console.WriteLine("Filtered"); 
     Console.WriteLine("Min: {0}", filtered.Min); 
     Console.WriteLine("Max: {0}", filtered.Max); 
     Console.WriteLine("Count: {0}", filtered.Count); 
     } 

     Console.ReadLine(); 
    } 
    } 
} 

我可以設想一個解決聚集在一時間濾波器與潛在的大範圍的問題。減少將不得不把事情分解成在不同層次上越來越小的時間單位。這個代碼有點複雜,但我正在爲自己的目的而努力。完成後,我將在www.ravendb.net的知識庫中發佈。


UPDATE

我用這個多一點玩耍,在這最後的查詢發現兩件事情。

  1. 在調用single之前,您必須先執行ToList()以獲得完整的結果集。
  2. 儘管這在服務器上運行,但結果範圍內的最大值爲1024,並且您必須指定Take(1024)或者默認值爲128 max。由於這在服務器上運行,我沒有想到這一點。但我想它是因爲你通常不會在TransformResults部分進行聚合。

我已經更新了這個代碼。但是,除非您可以保證範圍足夠小才能實現此功能,否則我會等待我提到的更好的完整映射/減少。我在做這個工作。 :)

+0

你一直很親切,謝謝你的幫助馬特:) – Janus007

+0

@ Janus007謝謝 - 請參閱我剛剛發佈的更新。 –