2012-06-06 54 views
1

我有這個表:複雜SQL編寫

table session(
ID number, 
SessionID VarChar, 
Date, 
Filter 
) 

此表包含搜索信息,例如:

ID SessionID     Date    filter 
4 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 meagPixel=5 
6 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 brand=Canon 
7 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 brand=Canon&meagPixel=12.1 
8 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 brand=Canon 
10 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 brand=Nikon 
12 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 meagPixel=12.1 
13 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 meagPixel=12.1&opticalZoom=True 
14 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 meagPixel=12.1&opticalZoom=True&brand=Panasonic 
16 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 price=500.00 
18 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 price=499.00 
19 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 price=499.00&brand=Olympus 
21 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 zoomRange=2000 
22 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 zoomRange=2000&brand=Leica 
23 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 zoomRange=2000&brand=Leica&price=1995.00 
24 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True 
25 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2 
26 peqq421gaspts3nuulq5mwcq 24/05/2012 13:50 zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2&weight=345 
27 peqq421gaspts3nuulq5mwcq 24/05/2012 13:58 zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2 
41 poiq41111spts00000q5aaaa 27/05/2012 13:48 meagPixel=5 

我想唯一的搜索。獨特的搜索是:

  • 用戶(會話)
  • 最長搜索(過濾器),如果第一個過濾器的變化 - 它需要被視爲新的搜索(過濾器)

由於ASP.NET不保證SessionID是唯一的(SessionID,Date)。

我沒有走遠後:

SELECT  MAX(Filter) 
FROM   Session 
GROUP BY SessionID 

BTW的結果,因爲我給了應該返回這個示例表數據:

ID SessionID     Date    filter    
4 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 meagPixel=5  
7 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 brand=Canon&meagPixel=12.1  
10 peqq421gaspts3nuulq5mwcq 24/05/2012 13:48 brand=Nikon  
14 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 meagPixel=12.1&opticalZoom=True&brand=Panasonic  
16 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 price=500.00   
19 peqq421gaspts3nuulq5mwcq 24/05/2012 13:49 price=499.00&brand=Olympus  
26 peqq421gaspts3nuulq5mwcq 24/05/2012 13:50 zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2&weight=345  
41 poiq41111spts00000q5aaaa 27/05/2012 13:48 meagPixel=5  

感謝您的幫助和指導。

+0

你能再次檢查你的預期輸出嗎? *品牌=佳能*和*品牌=佳能和meagPixel = 12.1 *具有相同的第一個過濾器,但他們單獨列出。雖然* zoomRange = 2000&brand = Leica&price = 1995.00&opticalZoom = True&meagPixel = 16.2&weight = 345 *只有一個條目,而在主表中有一條記錄* zoomRange = 2000&brand = Leica&price = 1995.00&opticalZoom = True&meagPixel = 16.2 * –

+0

因爲它不是很清楚我會改變它。 – Nir

+0

我非常非常抱歉,我現在只編輯我的帖子 - 我使用sql server compact 4而不是sql server standard edition – Nir

回答

1

@GarethD - TX在架構和插入查詢。 我試過了一些不同的方法。我不確定這是否適用於所有情況。它在mysql和mssql中工作。

  select * 
      from tsession t1 
      where not exists (
          select * 
          from tsession t2 
          where t2.filter like concat(t1.filter,'%') 
          and t1.filter<>t2.filter 
          and t1.sessionid=t2.sessionid) 
      order by id; 

這給出了問題中所需的確切結果。

+0

你在外面的地方「和過濾器不爲空」丟失。我真的不知道它是否回答了所有情況..你認爲呢? – Nir

+0

這肯定會得到最長的過濾器* concat(t1.filter,'%')*將確保。需要進一步測試的場景是在任何col上有其他條件的地方(對於任何分組要求)。對於過濾器不爲null,不清楚過濾器可能爲空的數據。 –

+0

是的,我已經在真實的桌子上測試過它,並且在那裏有空值。這將需要更多的測試,因爲它似乎太簡單了,無法得到它:) – Nir

0

爲了獲得最長的搜索過濾器,你需要做的是這樣的:

select s.* 
from (select s.*, 
      row_number() over (partition by sessionid order by len desc) as rownum 
     from (select s.*, len(filter) as len 
      from session s 
      ) s 
    ) s 
where rownum = 1 

我使用Windows函數這樣做。你可以通過使用聚合和連接來做同樣的事情。

但是,您所說的會話不是真正的標識符。會話/過濾器是。下面的查詢非常得到你想要什麼:(唯一的變化是將分區子句包括過濾器)

select s.* 
from (select s.*, 
      row_number() overo over (partition by sessionid, filter 
             order by len desc) as rownum 
     from (select s.*, len(filter) as len 
      from session s 
      ) s 
    ) s 
where rownum = 1 

您可能有重複。如果你想要所有的重複,一個稍微不同的查詢將工作。

0

首先,您的樣本數據看起來有誤,我認爲第25,26和27行應該都出現在您的最終數據中。 27肯定應該是因爲它是會話ID和日期組合的唯一條目。

假設以上是正確的,那麼我認爲我已經正確地建立了你的邏輯。

步驟1是定義爲每個濾波器的第一檢索詞和順序在它的會話中發生:

;WITH CTE AS 
( SELECT *, 
      SUBSTRING(Filter, 1, CASE WHEN CHARINDEX('&', Filter) = 0 THEN LEN(Filter) ELSE CHARINDEX('&', Filter) - 1 END) [FirstTerm], 
    FROM Session 
) 

下一步是制定出如果每個搜索是一個新的搜索,或者繼續前面的搜索。這是通過在會話中獲取上一個搜索項(爲什麼SessionOrder在上一個CTE中定義)以及確定第一個搜索項是否相同來完成的。

, CTE2 AS 
( SELECT T1.*, 
      CASE WHEN T1.SessionOrder = 1 OR T2.SessionOrder IS NOT NULL THEN 1 ELSE 0 END [NewSearch] 
    FROM CTE T1 
      LEFT JOIN CTE T2 
       ON T1.SessionID = T2.SessionID 
       AND T1.Date = T2.Date 
       AND T1.FirstTerm != T2.FirstTerm 
       AND T1.SessionOrder = T2.SessionOrder + 1 
) 

接下來,每個新搜索都需要它在會話中自己的排名,以便對purpuses進行分組。

, CTE3 AS 
( SELECT *, 
      ROW_NUMBER() OVER(PARTITION BY SessionID, Date, ISNULL(SearchNumber, 0) ORDER BY LEN(Filter) DESC) [SearchOrder] 
    FROM CTE2 T1 
      OUTER APPLY 
      ( SELECT SUM(NewSearch) [SearchNumber] 
       FROM CTE2 T2 
       WHERE T1.SessionOrder >= T2.SessionOrder 
       AND  T1.SessionID = T2.SessionID 
       AND  T1.Date = T2.Date 
      ) c 
) 

最後,所有你:那麼你有你的規則定義(會話ID,日期,和第一查詢詞的獨特組合),然後你可以根據過濾器的長度的獨特組合內訂購的每個項目需要做的是限制的結果,最長的檢索詞的SessionID,日期和第一過濾條件的每個組合:

SELECT ID, SessionID, Date, Filter 
FROM CTE3 
WHERE SearchOrder = 1 
ORDER BY ID 

通常我會把這一切一起SQLFiddle,而不是在這裏發表一個完整的工作的例子,但它似乎今天沒有工作。因此,這裏是我的我用來測試你的數據完整的SQL:

CREATE TABLE #Session (ID INT, SessionID VARCHAR(50), Date DATETIME, Filter VARCHAR(200)) 
INSERT INTO #Session VALUES 
    (2, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon'), 
    (4, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'meagPixel=5'), 
    (6, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon'), 
    (7, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon&meagPixel=12.1'), 
    (8, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon'), 
    (10, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Nikon'), 
    (12, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'meagPixel=12.1'), 
    (13, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'meagPixel=12.1&opticalZoom=True'), 
    (14, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'meagPixel=12.1&opticalZoom=True&brand=Panasonic'), 
    (16, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'price=500.00'), 
    (18, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'price=499.00'), 
    (19, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'price=499.00&brand=Olympus'), 
    (21, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000'), 
    (22, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica'), 
    (23, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica&price=1995.00'), 
    (24, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True'), 
    (25, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2'), 
    (26, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:50', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2&weight=345'), 
    (27, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:58', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2'), 
    (41, 'poiq41111spts00000q5aaaa', '27/05/2012 13:48', 'meagPixel=5') 

;WITH CTE AS 
( SELECT *, 
      SUBSTRING(Filter, 1, CASE WHEN CHARINDEX('&', Filter) = 0 THEN LEN(Filter) ELSE CHARINDEX('&', Filter) - 1 END) [FirstTerm], 
    FROM #Session 
), CTE2 AS 
( SELECT T1.*, 
      CASE WHEN T1.SessionOrder = 1 OR T2.SessionOrder IS NOT NULL THEN 1 ELSE 0 END [NewSearch] 
    FROM CTE T1 
      LEFT JOIN CTE T2 
       ON T1.SessionID = T2.SessionID 
       AND T1.Date = T2.Date 
       AND T1.FirstTerm != T2.FirstTerm 
       AND T1.SessionOrder = T2.SessionOrder + 1 
), CTE3 AS 
( SELECT *, 
      ROW_NUMBER() OVER(PARTITION BY SessionID, Date, ISNULL(SearchNumber, 0) ORDER BY LEN(Filter) DESC) [SearchOrder] 
    FROM CTE2 T1 
      OUTER APPLY 
      ( SELECT SUM(NewSearch) [SearchNumber] 
       FROM CTE2 T2 
       WHERE T1.SessionOrder >= T2.SessionOrder 
       AND  T1.SessionID = T2.SessionID 
       AND  T1.Date = T2.Date 
      ) c 
) 
SELECT ID, SessionID, Date, Filter 
FROM CTE3 
WHERE SearchOrder = 1 
ORDER BY ID 

DROP TABLE #Session 

附錄

OK,根據您的結果集,你實際上並不想通過組日期列,您只需按照第一個搜索詞和sessionID分組的順序放置行。

該查詢產生與您的樣本數據相同的結果。我已經在2008 R1中測試過了,但是看不出它在SQL-Server CE中不起作用的原因。

;WITH CTE AS 
( SELECT *, 
      ROW_NUMBER() OVER(PARTITION BY SessionID, SUBSTRING(Filter, 1, CASE WHEN CHARINDEX('&', Filter) = 0 THEN LEN(Filter) ELSE CHARINDEX('&', Filter) - 1 END) ORDER BY LEN(Filter) DESC) [RowNumber] 
    FROM Session 
) 
SELECT * 
FROM CTE 
WHERE RowNumber = 1 
ORDER BY ID 

最終解決方案的SQL Fiddle

+0

我在25,26,27行沒有錯誤。 26是該過濾器27中最長的搜索,是用戶所做的一步。 – Nir

+0

是的,但27有不同的時間到26;因此根據您的標準,這是一個新的會議? – GarethD

+0

@GarethD - 在sqlfiddle.com上有什麼不適合你? –