我終於選擇了第一個方法suggested由Gordon Linoff和一些小修改。我保留了原來的想法,但還引入了幾個額外的子查詢來指定組內記錄的所需分佈,並構建了一個具有每組所需記錄數的矩陣。還有全局參數部分,其中包含唯一的參數來指定整體記錄計數。
查詢生成非常有用的結果:
with
people as (
select id,
floor(months_between(sysdate, date_birth)/12) age,
195 - least(floor(months_between(sysdate, date_birth)/12), 50) height,
decode(sex, 1, 'male', 'female') gender
from my_people_table
where date_birth is not null and rownum < 100000
),
params as (/* Global params */
select 100 rec_count -- total record count
from dual
),
age_groups as ( /* distribution by height */
select 'group 1' age_group, .7 prc from dual union
select 'group 2' age_group, .3 prc from dual
),
height_groups as (/* distribution by height */
select 'group 1' height_group, .6 prc from dual union
select 'group 2' height_group, .4 prc from dual
),
genders as ( /* distribution by gender */
select 'male' gender, .6 prc from dual union
select 'female' gender, .4 prc from dual
),
mx as ( /* a matrix with record counts per group */
select age_group, height_group, gender,
ceil(
age_groups.prc *
height_groups.prc *
genders.prc *
rec_count
) rec_count
from age_groups, height_groups, genders, params
),
xpeople as ( /* Minor transformations - groups and group counters */
select p.*,
row_number() over (
partition by age_group, height_group, gender
order by age_group, height_group, gender
) rec_num
from (
select people.*,
case
when age <= 40 then 'group 1'
else 'group 2'
end age_group,
case
when height <= 180 then 'group 1'
else 'group 2'
end height_group
from people
) p
)
/* the resulting query uses the matrix to filter the records */
select xpeople.*
from xpeople join mx
on xpeople.age_group = mx.age_group
and xpeople.height_group = mx.height_group
and xpeople.gender = mx.gender
and xpeople.rec_num <= mx.rec_count
感謝您的幫助!
您可以使用8個查詢的UNION ALL--第一個返回的N * .3 * .6 * .4高於180釐米的人,男性超過40,下一個返回的N * .3 * .6 * 。身高超過180釐米,男,40歲以下的人有6人? –