2013-08-20 145 views
1

我需要查詢具有「性別」列的表中,如下所示:如何根據列值選擇不同百分比的數據?

 
| id | gender | name | 
------------------------- 
| 1 | M  | Michael | 
------------------------- 
| 2 | F  | Hanna | 
------------------------- 
| 3 | M  | Louie | 
------------------------- 

我需要提取具有所述第一N個結果,例如80%的男性和20%的女性。所以,如果我需要1000個結果,我想要檢索800個男性和200個女性。

  1. 是否有可能在單個查詢中執行此操作?怎麼樣?

  2. 如果我沒有足夠的記錄(假設我上面的例子中只有700個男性)是否可以自動選擇700/300?

+0

對於情景2,應該發生什麼? –

+0

我編輯了我的答案以更好地解釋我自己。 –

+0

不幸的是,我不知道足夠的SQL來給出答案的代碼,但我可以給出的邏輯: 我建議一個SP,並有一個值,N(你選擇的數字),並採取n * .8並選擇性別爲M的行,將返回的行計爲numResultsMale,然後選擇性別爲F的N-(numResultsMale)。 –

回答

2

基本上,你希望得到儘可能多的「M」,你可以,但不能超過你的百分比,然後得到足夠的「F」,所以你共有1000行:

with cte_m as (
    select * from Table1 where gender = 'M' limit (1000 * 0.8) 
), cte as (
    select *, 0 as ord from cte_m 
    union all 
    select *, 1 as ord from Table1 where gender = 'F' 
    order by ord 
    limit 1000 
) 
select id, gender, name 
from cte 

sql fiddle demo

+0

非常完美!謝謝! –

-1

我沒有PostgreSQL的我,但第一種情形是在MS SQL 2012年工會很容易我想你可以在postgre同樣做到這一點:

declare @MaxRows   INT 
     ,@PercentageMale INT 
     ,@PercentageFemale INT 

select  @MaxRows = 1000 
      ,@PercentageMale = 80 
      ,@PercentageFemale = 20 

select top (@MaxRows*@PercentageMale/100) * 
FROM  someTable 
WHERE  Gender = 'M' 
UNION 
select top (@MaxRows*@PercentageFemale/100) * 
FROM  someTable 
WHERE  Gender = 'F' 

第二位實際上很容易。基本上你想選擇男性的最高百分比,然後用女性填充列表的其餘部分,直到總行數。女性人數是不實際relavent:

declare @MaxRows   INT 
     ,@PercentageMale INT 

select  @MaxRows = 1000 
      ,@PercentageMale = 80 

SELECT TOP @MaxRows * 
FROM 
(
    select top (@MaxRows*@PercentageMale/100) * 
    FROM  someTable 
    WHERE  Gender = 'M' 
    UNION 
    select top (@MaxRows) * --we never want more than @MaxRows 
           --so no need to check for a %, 
           --just fill in the rest of the data set 
    FROM  someTable 
    WHERE  Gender = 'F' 
) a 
+1

-1問題不在於Sql Server。 –

0

如何;下面,假定您提供一個行數(「LMT」),並浮在M/F分佈:

create table gen (
id  integer, 
gender text, 
name text 
); 

-- inserts 75% males and 25% females into the source table ("gen") 
insert into gen select n, case when mod(n,5) = 0 then 'F' else 'M' end, (case when mod(n,5) = 0 then 'F' else 'M' end)||'_'||n::text 
from generate_series(1,20000) n 


-- extract 80/20 M vs F 
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct), 
    g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen) 
select * 
from g 
where (gender = 'M' and rn <= (select lmt*mpct from conf)) 
or (gender = 'F' and rn <= (select lmt*fpct from conf)); 


-- Same query, to show the percent M vs F: 
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct), 
    g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen) 
select gender,count(*) 
from (
    select * 
    from g 
    where (gender = 'M' and rn <= (select lmt*mpct from conf)) 
    or (gender = 'F' and rn <= (select lmt*fpct from conf)) 
    ) y 
group by gender