2013-01-12 46 views
0

我有了這個problem..I有以下格式的CSV文件(客戶,購買項目對):做支點與LINQ

customer1 item1 
customer1 item2 
customer1 item3 
customer2 item4 
customer2 item2 
customer3 item5 
customer3 item1 
customer3 item2 
customer4 item1 
customer4 item2 
customer5 item5 
customer5 item1 

現在,我想在查詢結果中顯示:

item x; item y; how many customers have bought itemx and item together 

例如:

item1 item2 3 (because cust1 and cust2 and cust3 bought item1 and item2 together) 
item1 item5 1 (because cust5 and cust3 bought item1 and item5 together) 

查詢返回客戶在對已購買的物品的所有可能的組合。另請注意Pair(x,y)與Pair(y,x)相同。

SQL查詢應該是這樣的:

SELECT a1.item_id, a2.item_id, COUNT(a1.cust_id) AS how_many_custs_bought_both 
    FROM data AS a1 
INNER JOIN data AS a2 
    ON a2.cust_id=a1.cust_id AND a2.item_id<>a1.item_id AND a1.item_id<a2.item_id 
GROUP BY a1.item_id, a2.item_id 

你會怎麼做,在C#1),使用常規的/使用的foreach循環LINQ 2)?

我試着在LINQ中做這件事,但當我注意到LINQ在連接子句中不支持多個equals關鍵字時卡住了。然後,我嘗試使用普通循環,但是,它變得如此無效,以至於它每秒只能處理30行(CSV文件行)。

請指教!

+0

你試過的linq聲明在哪裏?該.csv文件的分隔符是什麼。爲什麼不把這個查詢放到存儲過程中,或者爲什麼不使用'system.data.sqlclient'類並使用參數化查詢..? – MethodMan

回答

1

使用LINQ(及繼起的蒂姆的回答第5行)的鏈接方法的語法與查詢語法相結合的連接的一部分:

var custItems = new [] { 
    new { customer = 1, item = 1 }, 
    new { customer = 1, item = 2 }, 
    new { customer = 1, item = 3 }, 
    new { customer = 2, item = 4 }, 
    new { customer = 2, item = 2 }, 
    new { customer = 3, item = 5 }, 
    new { customer = 3, item = 1 }, 
    new { customer = 3, item = 2 }, 
    new { customer = 4, item = 1 }, 
    new { customer = 4, item = 2 }, 
    new { customer = 5, item = 5 }, 
    new { customer = 5, item = 1 } 
}; 
}; 

var pairs = custItems.GroupBy(x => x.customer) 
     .Where(g => g.Count() > 1) 
     .Select(x => (from a in x.Select(y => y.item) 
         from b in x.Select(y => y.item) 
         where a < b //If you want to avoid duplicate (a,b)+(b,a) 
         // or just: where a != b, if you want to keep the dupes. 
         select new { a, b})) 
     .SelectMany(x => x) 
     .GroupBy(x => x) 
     .Select(g => new { Pair = g.Key, Count = g.Count() }) 
     .ToList(); 

pairs.ForEach(x => Console.WriteLine(x)); 

編輯:忘記了OP希望對發生次數,增加了另一個.GroupBy()魔術。

編輯:完成了例子來說明什麼將輸出:

{ Pair = { a = 1, b = 2 }, Count = 3 } 
{ Pair = { a = 1, b = 3 }, Count = 1 } 
{ Pair = { a = 2, b = 3 }, Count = 1 } 
{ Pair = { a = 2, b = 4 }, Count = 1 } 
{ Pair = { a = 1, b = 5 }, Count = 2 } 
{ Pair = { a = 2, b = 5 }, Count = 1 } 

編輯:回滾和改變字符串爲整數,如OP顯示了整數作爲ID的數據集,並去除需要.GetHashCode()

+0

對不起,剛發現,這畢竟不是工作。它不輸出正確的信息。 – user315648

+0

嘿@ user315648,我已經完成了產出輸出的例子。爲什麼它不起作用?這不是你想要的嗎? – istepaniuk

+0

對不起,它輸出重複的信息。 例如:(Item1,Item2)對與(Item2,Item1)相同。 上面的代碼將(a,b)與(b,a)不同作爲對待,這是錯誤的。 – user315648

0

工作LINQ示例,不太漂亮!

using System; 
using System.Collections.Generic; 
using System.Linq; 

class Data 
{ 
    public Data(int cust, int item) 
    { 
     item_id = item; 
     cust_id = cust; 
    } 
    public int item_id { get; set; } 
    public int cust_id { get; set; } 

    static void Main(string[] args) 
    { 
     var data = new List<Data> 
         {new Data(1,1),new Data(1,2),new Data(1,3), 
         new Data(2,4),new Data(2,2),new Data(3,5), 
         new Data(3,1),new Data(3,2),new Data(4,1), 
         new Data(4,2),new Data(5,5),new Data(5,1)}; 

      (from a1 in data 
      from a2 in data 
      where a2.cust_id == a1.cust_id && a2.item_id != a1.item_id && a1.item_id < a2.item_id 
      group new {a1, a2} by new {item1 = a1.item_id, item2 = a2.item_id} 
      into g 
      select new {g.Key.item1, g.Key.item2, count = g.Count()}) 
      .ToList() 
      .ForEach(x=>Console.WriteLine("{0} {1} {2}",x.item1,x.item2,x.count)) 
      ; 
      Console.Read(); 
    } 
} 

輸出:

1 2 3 
1 3 1 
2 3 1 
2 4 1 
1 5 2 
2 5 1 
1

也許:

var lines = File.ReadLines(csvFilePath); 
var custItems = lines 
    .Select(l => new { split = l.Split() }) 
    .Select(x => new { customer = x.split[0].Trim(), item = x.split[1].Trim() }) 
    .ToList(); 

var groups = from ci1 in custItems 
      join ci2 in custItems 
      on ci1.customer equals ci2.customer 
      where ci1.item != ci2.item 
      group new { Item1 = ci1.item, Item2 = ci2.item } by new { Item1 = ci1.item, Item2 = ci2.item } into ItemGroup 
      select ItemGroup; 

var result = groups.Select(g => new 
{ 
    g.Key.Item1, 
    g.Key.Item2, 
    how_many_custs_bought_both = g.Count() 
}); 

注意與ToList物化是重要的,當該文件是因爲自連接大。

{ Item1 = item1, Item2 = item2, how_many_custs_bought_both = 3 } 
{ Item1 = item1, Item2 = item3, how_many_custs_bought_both = 1 } 
{ Item1 = item2, Item2 = item1, how_many_custs_bought_both = 3 } 
{ Item1 = item2, Item2 = item3, how_many_custs_bought_both = 1 } 
{ Item1 = item3, Item2 = item1, how_many_custs_bought_both = 1 } 
{ Item1 = item3, Item2 = item2, how_many_custs_bought_both = 1 } 
{ Item1 = item4, Item2 = item2, how_many_custs_bought_both = 1 } 
{ Item1 = item2, Item2 = item4, how_many_custs_bought_both = 1 } 
{ Item1 = item5, Item2 = item1, how_many_custs_bought_both = 2 } 
{ Item1 = item5, Item2 = item2, how_many_custs_bought_both = 1 } 
{ Item1 = item1, Item2 = item5, how_many_custs_bought_both = 2 } 
{ Item1 = item2, Item2 = item5, how_many_custs_bought_both = 1 } 
+0

該代碼還將(item1,item2)視爲不同於(item2,item1),這是錯誤的。它輸出重複.. – user315648

+0

對不起,它對待他們同樣沒有不同。 – user315648

1

你可以寫一些像這樣:

IDictionary<int, int> pivotResult = customerItems.ToLookup(c => c.Customer) 
           .ToDictionary(x=>x.Key, y=>y.Count()); 
相關問題