提取特定數量的記錄以滿足某些總體條件

這裏我描述一個抽象案例，但它與我現在試圖解決的案例類似。我知道如何用PL/SQL塊獲得粗略的結果，但我不知道是否有人可以用單個選擇查詢來提出解決方案。提取特定數量的記錄以滿足某些總體條件

假設我們有一個表t_people與成千上萬的記錄描述了某些羣體的人具有以下屬性集：

id
age，數
釐米height，數
gender，varchar2（'男'或'女'）

我們需要提取N個記錄，使得結果集滿足下列條件：選擇人的

30％小於180釐米更高選擇人的
60％是男性
40選定人員的百分比大於40

我們也可以假設N遠小於表中的總行數，問題是可以解決的。

你會如何建議用一個選擇查詢來做到這一點？

感謝

來源

2016-03-11 Dmitry Grekov

您可以使用8個查詢的UNION ALL--第一個返回的N * .3 * .6 * .4高於180釐米的人，男性超過40，下一個返回的N * .3 * .6 * 。身高超過180釐米，男，40歲以下的人有6人？ –

你會分層數據分成8組，然後採取按比例取樣從每個組，以滿足您的要求。一個粗略的方法是將條件轉換爲組，說：

300人比180更高，而不是男性，年齡不
100人短，不男性，年齡不
400人短，男性，老年
200人短，男性，年齡不

然後，你可以爲解決這個：

with p as (
     select p.*, 
      row_number() over (partition by height, male, age order by height) as seqnum 
     from (select p.*, 
        (case when height > 180 then 1 else 0 end) as height, 
        (case when gender = 'male' then 1 else 0 end) as male, 
        (case when age > 40 then 1 else 0 end) as age 
      from people p 
      ) p 
    ) 
select p.* 
from p 
where (height = 1 and male = 0 and age = 0 and seqnum <= 300) or 
     (height = 0 and male = 0 and age = 0 and seqnum <= 100) or 
     (height = 0 and male = 1 and age = 1 and seqnum <= 400) or 
     (height = 0 and male = 1 and age = 0 and seqnum <= 200);

還有另外一種方法，您可以在平均填充8個桶的位置使用，以跟蹤每個維度（年輕/年老，男性/女性，更短/更高）的數字。然後當第一個維度被填充時，你停止填充，並繼續填充4個互補單元。重複這個過程直到你有所需的數字。

來源

2016-03-11 16:48:48

非常感謝，你的第一種方法會適合我。雖然在我的真實情況下，我有9個分組屬性，但我認爲你的想法在那裏仍然是可接受的。至於第二種方法，我想到了類似的東西，但略有不同。首先，我要隨機選擇N行並查看比例。雖然比例不夠好，但我會從剩下的一組中更好地替換壞記錄。如果記錄更接近想要的比例，記錄會更好。再次，非常感謝，祝你週末愉快！ –

我終於選擇了第一個方法suggested由Gordon Linoff和一些小修改。我保留了原來的想法，但還引入了幾個額外的子查詢來指定組內記錄的所需分佈，並構建了一個具有每組所需記錄數的矩陣。還有全局參數部分，其中包含唯一的參數來指定整體記錄計數。

查詢生成非常有用的結果：

with 
    people as (
     select id, 
       floor(months_between(sysdate, date_birth)/12) age, 
       195 - least(floor(months_between(sysdate, date_birth)/12), 50) height, 
       decode(sex, 1, 'male', 'female') gender 
     from my_people_table 
     where date_birth is not null and rownum < 100000 
    ), 
    params as (/* Global params */ 
     select 100 rec_count -- total record count 
     from dual 
    ), 
    age_groups as ( /* distribution by height */ 
     select 'group 1' age_group, .7 prc from dual union 
     select 'group 2' age_group, .3 prc from dual 
    ), 
    height_groups as (/* distribution by height */ 
     select 'group 1' height_group, .6 prc from dual union 
     select 'group 2' height_group, .4 prc from dual 
    ), 
    genders as (  /* distribution by gender */ 
     select 'male' gender, .6 prc from dual union 
     select 'female' gender, .4 prc from dual 
    ), 
    mx as (   /* a matrix with record counts per group */ 
     select age_group, height_group, gender, 
       ceil(
        age_groups.prc * 
        height_groups.prc * 
        genders.prc * 
        rec_count 
       ) rec_count  
     from age_groups, height_groups, genders, params 
    ), 
    xpeople as (  /* Minor transformations - groups and group counters */ 
     select p.*, 
       row_number() over (
        partition by age_group, height_group, gender 
         order by age_group, height_group, gender 
       ) rec_num 
     from (        
       select people.*, 
         case 
          when age <= 40 then 'group 1' 
               else 'group 2' 
         end age_group, 
         case 
          when height <= 180 then 'group 1' 
               else 'group 2' 
         end height_group 
       from people 
     ) p 
    ) 
/* the resulting query uses the matrix to filter the records */  
select xpeople.* 
from xpeople join mx 
      on xpeople.age_group = mx.age_group 
      and xpeople.height_group = mx.height_group  
      and xpeople.gender = mx.gender 
      and xpeople.rec_num <= mx.rec_count

感謝您的幫助！

來源

2016-03-14 15:28:32

提取特定數量的記錄以滿足某些總體條件

回答

相關問題