2014-02-18 88 views
3

假設我有下表,我如何按ID分組,並獲得每列中最常見的值 p.s.表是大了,我需要爲許多列Oracle如何在多列中獲得列中最常見的值?

做到這一點
ID Col1 Col2 Col3.... 
1 A  null 
1 A  X 
1 B  null 
1 A  Y 
2 C  X 
2 C  Y 
2 A  Y 
3 B  Z 
3 A  Z 
3 A  Z 
3 B  X 
3 B  Y 

預期結果:

ID Col1 Col2 Col3.... 
1 A  null 
2 C  Y 
3 B  Z 

回答

4

這裏有一種方法,使用分析功能和keep

select id, 
     min(col1) keep(dense_rank first order by cnt_col1 desc) as col1_mode, 
     min(col2) keep(dense_rank first order by cnt_col2 desc) as col2_mode, 
     min(col3) keep(dense_rank first order by cnt_col3 desc) as col3_mode 
from (select id, 
      count(*) over (partition by id, col1) as cnt_col1, 
      count(*) over (partition by id, col2) as cnt_col2, 
      count(*) over (partition by id, col3) as cnt_col3 
     from t 
    ) t 
group by id; 

最統計中頻繁的值被稱爲「模式」,Oracle提供了一個計算這個值的函數。所以,一個簡單的方法是使用stats_mode()

select id, 
     stats_mode(col1) as mode_col1, 
     stats_mode(col2) as mode_col2, 
     stats_mode(col3) as mode_col3 
    from table t 
    group by id; 

編輯:

正如評論指出,stats_mode()不計算NULL值。解決這個問題的最簡單方法是找到一些價值,是不是在數據和做:

select id, 
      stats_mode(coalesce(col1, '<null>')) as mode_col1, 
      stats_mode(coalesce(col2, '<null>')) as mode_col2, 
      stats_mode(coalesce(col3, '<null>')) as mode_col3 
    from table t 
    group by id; 

另一種方式是恢復到第一種方法或類似的東西:

select id, 
     (case when sum(case when col1 = mode_col1 then 1 else 0 end) >= sum(case when col1 is null then 1 else 0 end) 
      then mode_col1 
      else NULL 
     end) as mode_col1, 
     (case when sum(case when col2 = mode_col2 then 1 else 0 end) >= sum(case when col2 is null then 1 else 0 end) 
      then mode_col2 
      else NULL 
     end) as mode_col2, 
     (case when sum(case when col3 = mode_col13 then 1 else 0 end) >= sum(case when col3 is null then 1 else 0 end) 
      then mode_col3 
      else NULL 
     end) as mode_col3 
from (select t.*, 
      stats_mode(col1) over (partition by id) as mode_col1, 
      stats_mode(col2) over (partition by id) as mode_col2, 
      stats_mode(col3) over (partition by id) as mode_col3 
     from table t 
    ) t 
group by id; 
+0

嗨戈登,結果不符合'NULL'值。 – ajmalmhd04

+0

如果不是'stats_mode(...)結束(按col1分區)'? –

+0

@RenéNyffenegger。 。 。 'stats_mode()'可以是聚合函數或分析函數。 –