2015-12-11 95 views
1

我有一個數據集,我想用postgres sql將它分成70:30的比率,並將其分配到訓練和測試集中。我怎樣才能做到這一點。我用下面的代碼,但它似乎沒有工作在Postgres中將數據集分成訓練和測試集

create table training_test as 
(
WITH TEMP as 
(
    SELECT ROW_NUMBER() AS ROW_ID , Random() as RANDOM_VALUE,D.* 
     FROM analytics.model_data_discharge_v1 as D 
     ORDER BY RANDOM_VALUE 
) 

SELECT 'Training',T.* FROM TEMP T 
WHERE ROW_ID <= 493896*0.70 
UNION 
SELECT 'Test',T.* FROM TEMP T 
WHERE ROW_ID > 493896*0.70 
) distributed by(hospitalaccountrecord); 

回答

2
select t.*, 
    case 
     when random() < 0.7 then 'training' 
     else 'test' 
    end as split 
from analytics.model_data_discharge_v1 t 
相關問題