2012-02-24 11 views
1

我有一個大的sas數據集(1.5m obs,〜250個變量),我需要將其分割成幾個大小相同的小型sas數據集以進行批處理。每個數據集都需要包含所有變量,但只包含部分obs。這樣做的最快方法是什麼?劃分一個sas數據集以進行批處理的最快方法是什麼?

+1

這看起來很有希望:http://support.sas.com/resources/papers/proceedings10/109-2010.pdf – sasfrog 2012-02-25 21:20:05

回答

2

你可以做類似如下:

%macro splitds(inlib=,inds=,splitnum=,outid=); 

    proc sql noprint; 
    select nobs into :nobs 
    from sashelp.vtable 
    where libname=upcase("&inlib") and memname=upcase("&inds"); 
    quit; 
    %put Number of observations in &inlib..&inds.: &nobs; 

    data %do i=1 %to &splitnum.; 
     &outid.&i 
     %end;; 
    set &inds.; 
    %do j=1 %to (&splitnum.-1); 
     %if &j.=1 %then %do; 
     if 
     %end; 
     %else %do; 
     else if 
     %end; 
       _n_<=((&nobs./&splitnum.)*&j.) then output &outid.&j.; 
    %end; 
    else output &outid.&splitnum.; 
    run; 
%mend; 

一個例子來電MYLIB.MYDATA分成10個數據集命名NEWDATA1 - NEWDATA10是:

%splitds(inlib=mylib,inds=mydata,splitnum=10,outid=newdata); 
1

試試這個。我還沒有測試,所以期待一個錯誤的地方。您將需要編輯BATCH_PROCESS宏調用,包括數據集的名稱,新的數據集數等

%macro nobs (dsn); 
    %local nobs dsid rc; 
    %let nobs=0; 
    %let dsid = %sysfunc(open(&dsn)); 
    %if &dsid %then %do; 
     %let nobs = %sysfunc(attrn(&dsid,NOBS)); 
    %end; 
    %else %put Open for dataset &dsn failed - %sysfunc(sysmsg()); 
    %let rc = %sysfunc(close(&dsid)); 
    &nobs 
%mend nobs; 

%macro batch_process(dsn_in,dsn_out_prefix,number_of_dsns); 

    %let dsn_obs = &nobs(&dsn_in); 
    %let obs_per_dsn = %sysevalf(&dsn_obs/&number_of_dsns); 

    data 
    %do i = 1 %to &number_of_dsns; 
     &dsn_out_prefix.&i 
    %end; 
    ; 
    set &dsn_in; 
    drop _count; 
    retain _count 0; 
    _count = _count + 1; 
    %do i = 1 %to &number_of_dsns; 
     if (1 + ((&i - 1) * &obs_per_dsn)) <= _count <= (&i * &obs_per_dsn) then do; 
      output &dsn_out_prefix.&i; 
     end; 
    %end; 
    run; 

%mend batch_process; 

%batch_process(dsn_in=DSN_NAME , dsn_out_prefix = PREFIX_ , number_of_dsns = 5);  
+0

謝謝!我在思考類似的方法,但有一些不同:使用\ _n \ _而不是生成自己的行計數器,並在每個輸出數據集中對\ _n \ _應用where子句。對這種方法有任何想法? – user667489 2012-02-25 10:26:34

相關問題