2013-07-24 36 views
9

我有以下格式的數據。找到最長的勝利

match_id team_id won_ind 
---------------------------- 
37   Team1 N 
67   Team1 Y 
98   Team1 N 
109   Team1 N 
158   Team1 Y 
162   Team1 Y 
177   Team1 Y 
188   Team1 Y 
198   Team1 N 
207   Team1 Y 
217   Team1 Y 
10   Team2 N 
13   Team2 N 
24   Team2 N 
39   Team2 Y 
40   Team2 Y 
51   Team2 Y 
64   Team2 N 
79   Team2 N 
86   Team2 N 
91   Team2 Y 
101   Team2 N 

match_id這裏s爲按時間順序排列,37是第一和217是由播放TEAM1最後的匹配。 won_ind表示球隊是否贏得比賽。

因此,從上面的數據來看,team1已經失去了第一場比賽,然後贏得了一場比賽,然後輸了2場比賽,然後贏得了4場連續比賽等等。現在我有興趣找到每支球隊最長的連勝記錄。

Team_id longest_streak 
------------------------ 
Team1  4 
Team2  3 

我知道如何在plsql中找到它,但我想知道這是否可以在純SQL中計算。我嘗試過使用LEAD,LAG和其他幾個函數,但是沒有找到任何地方。

我已經創建了樣本提琴here

+1

我沒有時間複製這篇文章,但[這篇出色的文章](http://www.sqlteam.com/article/detecting-runs-or-streaks-in-your-data)討論瞭如何完成這使用自我連接和總和。 – eykanal

回答

5
with original_data as (
    select 37 match_id, 'Team1' team_id, 'N' won_id from dual union all 
    select 67 match_id, 'Team1' team_id, 'Y' won_id from dual union all 
    select 98 match_id, 'Team1' team_id, 'N' won_id from dual union all 
    select 109 match_id, 'Team1' team_id, 'N' won_id from dual union all 
    select 158 match_id, 'Team1' team_id, 'Y' won_id from dual union all 
    select 162 match_id, 'Team1' team_id, 'Y' won_id from dual union all 
    select 177 match_id, 'Team1' team_id, 'Y' won_id from dual union all 
    select 188 match_id, 'Team1' team_id, 'Y' won_id from dual union all 
    select 198 match_id, 'Team1' team_id, 'N' won_id from dual union all 
    select 207 match_id, 'Team1' team_id, 'Y' won_id from dual union all 
    select 217 match_id, 'Team1' team_id, 'Y' won_id from dual union all 
    select 10 match_id, 'Team2' team_id, 'N' won_id from dual union all 
    select 13 match_id, 'Team2' team_id, 'N' won_id from dual union all 
    select 24 match_id, 'Team2' team_id, 'N' won_id from dual union all 
    select 39 match_id, 'Team2' team_id, 'Y' won_id from dual union all 
    select 40 match_id, 'Team2' team_id, 'Y' won_id from dual union all 
    select 51 match_id, 'Team2' team_id, 'Y' won_id from dual union all 
    select 64 match_id, 'Team2' team_id, 'N' won_id from dual union all 
    select 79 match_id, 'Team2' team_id, 'N' won_id from dual union all 
    select 86 match_id, 'Team2' team_id, 'N' won_id from dual union all 
    select 91 match_id, 'Team2' team_id, 'Y' won_id from dual union all 
    select 101 match_id, 'Team2' team_id, 'N' won_id from dual 
), 
---------------------------------------------------------------------- 
new_streaks as (
-- 
-- Identifying new streaks. 
-- ------------------------ 
-- 
    select 
     match_id, 
     team_id, 
     won_id, 
-- 
-- A new streak is identfied if 
-- 
    case when 
-- 
-- a) won_id = 'Y' and 
-- 
     won_id = 'Y' and 
-- 
-- b) the previous won_id = 'N': 
--  
     lag(won_id) over (partition by team_id order by match_id) = 'N' 
-- 
--  
     then 1 
-- 
-- All other cases: no new streak: 
     else 0 
-- 
    end new_streak 
    from 
     original_data 
), 
------------------------------- 
streak_no as (
-- 
-- Assigning a unique number to each streak. 
-- ----------------------------------------- 
-- 
select 
-- 
    match_id, 
    team_id, 
-- 
-- In order to be able to count the number of records 
-- of a streak, we first need to assign a unique number 
-- to each streak: 
-- 
    sum(new_streak) over (partition by team_id order by match_id) streak_no 
-- 
from 
    new_streaks 
where 
-- We're only interested in «winning streaks»: 
    won_id = 'Y' 
), 
----------------------------------------------- 
-- 
-- Counting the elements per streak 
-- -------------------------------- 
-- 
records_per_streak as (
select 
    count(*) counter, 
    team_id, 
    streak_no 
from 
    streak_no 
group by 
    team_id, 
    streak_no 
) 
------------------------------------------------ 
-- 
-- Finally: we can find the «longest streak» 
-- per team: 
-- 
select 
    max(counter) longest_streak, 
    team_id 
from 
    records_per_streak 
group by team_id 
; 
+0

非常好..雖然它與Slartibartfast的答案類似,但這很容易理解。 – Noel

7

這應該工作, 小提琴這裏:http://sqlfiddle.com/#!4/31f95/27

SELECT team_id, MAX(seq_length) AS longest_sequence 
     FROM (SELECT team_id, COUNT(*) AS seq_length 
       FROM (SELECT team_id, won_ind,match_id, SUM(new_group) OVER(ORDER BY match_id) AS group_no 
         FROM (SELECT team_id, won_ind, match_id, 
             DECODE(LAG(won_ind) OVER(ORDER BY match_id), won_ind, 0, 1) AS new_group 
            FROM matches 
           ORDER BY team_id)) 
       WHERE won_ind = 'Y' 
      GROUP BY team_id, group_no) 
    GROUP BY team_id 
    ORDER BY 2 DESC, 1; 
+0

只是一個問題,這個數字在你的order by子句中意味着什麼? –

+0

1 = team_id,2 = longest_sequence,用於選擇的列 – Slartibartfast

+0

@Slartibartfast我認爲在'DECODE'和'SUM'函數中都需要'by team_id'分區。當我單獨運行內部select語句時,在計算new_group和group_no時存在一些差異 – Noel

2

使用應答的變體我張貼here

select 
    team_id, 
    max(wins) 
    from 
    (
    select 
      a.team_id, 
      a.match_id amatch, 
      b.match_id bmatch, 
    (select count(distinct match_id) 
     from matches matches_inner 
     where a.team_id = matches_inner.team_id 
     and matches_inner.match_id between a.match_id and b.match_id) wins 
     from 
      matches a 
      join matches b on a.team_id = b.team_id 
         and b.match_id > a.match_id 
    where 
    not exists 
    (select 'x' 
     from matches matches_inner 
     where a.team_id = matches_inner.team_id 
     and matches_inner.match_id between a.match_id and b.match_id 
     and matches_inner.won_ind = 'N') 

group by team_id 
+1

不錯。但是,如果最長連線爲1,則不會返回該值。將'b.match_id> a.match_id'改爲'b.match_id> = a.match_id'應該可以解決這個問題。 – Noel

+0

最後還有一個右括號。 – jakejgordon

1

我對Teradata的類似的任務,它修改在Oracle上運行:

SELECT 
    team_id, 
    MAX(cnt) 
FROM 
(
    SELECT 
     team_id, 
     COUNT(*) AS cnt 
    FROM 
    (
     SELECT 
     team_id, 
     match_id, 
     won_ind, 
     SUM(CASE WHEN won_ind <> 'Y' THEN 1 END) 
     OVER (PARTITION BY team_id 
       ORDER BY match_id 
       ROWS UNBOUNDED PRECEDING) AS dummy 
     FROM matches 
    ) dt 
    WHERE won_ind = 'Y' 
    GROUP BY team_id, dummy 
) dt 
GROUP BY team_id; 
+0

不錯。消除了對LAG功能的需求。 – Noel