2017-04-23 43 views
0

我有這些疑問:爲什麼Double GroupBy + ToList耗時過長?

var Data = (from ftr in db.TB_FTR 
         join mst in db.TB_MST on ftr.MST_ID equals mst.MST_ID 
         join trf in db.TB_TRF on mst.TRF_ID equals trf.ID 
         select new CityCountyType { City = ftr.CITY, County = ftr.COUNTY, Type = trf.TYPE } 
        ).OrderBy(i => i.City).ThenBy(i => i.County); 

var Data2 = 
    Data.GroupBy(i => new {i.City, i.County, i.Type}) 
     .Select(group => new {Name = group.Key, Count = group.Count()}) 
     .OrderBy(x => x.Name) 
     .ThenByDescending(x => x.Count) 
     .GroupBy(g => new {g.Name.City, g.Name.County}) 
     .Select(g => g.Select(g2 => 
      new {Name = new {g.Key.City, g.Key.County, g2.Name.Type}, g2.Count})).ToList(); 

我試圖讓對象,它們的縣市是相同的列表清單。但是第二個查詢花費太長時間才能得出結果。我等了大約30分鐘,但沒有回答,但是名單Data有大約5000條記錄。如何更改這些查詢以便我可以獲取我想要的列表清單?提前致謝。

例如該查詢返回這樣一個列表:

{ Name = {{ City = New York City, County = Bronx, Type = Type A }}, Count = 4 } 

{ Name = {{ City = New York City, County = Bronx, Type = Type B }}, Count = 8 } 

{ Name = {{ City = New York City, County = Bronx, Type = Type C }}, Count = 24 } 

{ Name = {{ City = New York City, County = Manhattan, Type = Type B }}, Count = 43 } 

{ Name = {{ City = New York City, County = Manhattan, Type = Type C }}, Count = 58 } 

{ Name = {{ City = Seattle, County = King County, Type = Type D }}, Count = 43 } 

{ Name = {{ City = Seattle, County = King County, Type = Type A }}, Count = 67 } 

{ Name = {{ City = Seattle, County = Snohomish County, Type = Type C }}, Count = 67 } 

我想使這個名單幾個列表如下:

表1:

{ Name = {{ City = New York City, County = Bronx, Type = Type A }}, Count = 4 } 

{ Name = {{ City = New York City, County = Bronx, Type = Type B }}, Count = 8 } 

{ Name = {{ City = New York City, County = Bronx, Type = Type C }}, Count = 24 } 

表2:

{ Name = {{ City = New York City, County = Manhattan, Type = Type B }}, Count = 43 } 

{ Name = {{ City = New York City, County = Manhattan, Type = Type C }}, Count = 58 } 

列表3:

{ Name = {{ City = Seattle, County = King County, Type = Type D }}, Count = 43 } 

{ Name = {{ City = Seattle, County = King County, Type = Type A }}, Count = 67 } 

表4:

{ Name = {{ City = Seattle, County = Snohomish County, Type = Type C }}, Count = 67 } 
+0

不能使用Data.Where(a => a.City.ToString()== a.County.ToString())。select(a); – Biswabid

+0

@Biswabid不,我認爲我不能很好地解釋自己。請參閱我的編輯。 – jason

+0

這就是爲什麼我不喜歡linq。如果你不明白它是如何工作的,那可能會很糟糕。您調用的每種方法都會通過IEnumerable返回另一個IEnumerable。所以這些鏈接的電話都堆疊起來,你可能會做十個,十五個循環。如果你使用自己的代碼而不是linq來重寫單一函數,你會看到性能提高 – peteisace

回答

1

可能性1:您的數據庫沒有被索引,以支持您的查詢(其中,並加入條款)。

爲了弄清楚,獲取生成的sql並查看執行計劃。如果該計劃顯示嵌套循環連接 - >聚集索引掃描,則發現問題。

可能性2:您發現了n + 1問題。

在Linq的GROUP BY中,一個組由組密鑰和組成員組成。但是在大多數SQL實現中,GROUP BY爲您提供組密鑰和聚合。爲了獲得一個組的成員,發出一個單獨的查詢。如果有n個組,則必須發出n個查詢(+1是原始查詢)。

爲了弄清楚,得到生成的sql。如果發佈了一堆額外的查詢,並且其中任何一個都說聚簇索引掃描,那麼你就發現了這個問題。

可能性3:您實際上正在發出n^2(〜5,000,000)個查詢。

那麼,你分組了兩次,所以它可能是一個雙重嵌套的循環。看看生成的SQL並找出答案。


最簡單的解決方法就是在分組之前將5,000條記錄拉入內存。一個簡單的方法是在致電GroupBy之前致電ToList

+0

謝謝你這樣一個詳細的答案。當我在'GroupBy'之前調用'ToList'時,它會給出零答案。 – jason

相關問題