2013-01-23 65 views
2

所以我想根據特定的變量分組打破一個大的數據集(70,000 obs與1,790變量)。 Excel或CSV是輸出的理想格式,但對可變數字(260或某物)有限制。任何想法如何我可以在SAS(或R/SQL,否則)做到這一點?循環有條件地導出表SAS(大量的變量)

我知道宏的作品,我以前用過它。錯誤消息讀取變量的限制已達到。

+0

如果你有足夠的RAM,你可以在R - 'install.packages(「sas7bdat」);圖書館(sas7bdat); write.csv(read.sas7bdat(「c:/path/to/file.sas7bdat」),「c:/path/to/outfile.csv」)' –

+1

@AnthonyDamico - 'sas7bdat'與很多SAS文件仍然。如果你想逃避SAS,我只需'將'導出......'到SAS的'csv',然後'R中的'read.csv(「file.csv」)'70K * 2K應該很容易地放入R的內存中 – thelatemail

回答

5

創建Excel文件肯定有限制,但不是CSV文件。下面是一個使用虛擬SAS數據集的例子:

data a; 
    array x(*) x1-x1790; 
    do j=1 to 5; 
    do i=1 to dim(x); 
     x(i) = ranuni(0); 
     end; 
    output; 
    end; 
run; 

proc export data=a 
    outfile="c:\temp\tempfile.csv" 
    dbms=CSV 
    replace; 
run; 

這裏是相關日誌:

NOTE: The file 'c:\temp\tempfile.csv' is: 
     Filename=c:\temp\tempfile.csv, 
     RECFM=V,LRECL=32767,File Size (bytes)=0, 
     Last Modified=23Jan2013:15:27:13, 
     Create Time=23Jan2013:15:27:13 

NOTE: 6 records were written to the file 'c:\temp\tempfile.csv'. 
     The minimum record length was 9636. 
     The maximum record length was 23087. 
NOTE: There were 5 observations read from the data set WORK.A. 
NOTE: DATA statement used (Total process time): 
     real time   0.26 seconds 
     cpu time   0.09 seconds 


5 records created in c:\temp\tempfile.csv from A. 


NOTE: "c:\temp\tempfile.csv" file was successfully created. 
NOTE: PROCEDURE EXPORT used (Total process time): 
     real time   2.04 seconds 
     cpu time   0.26 seconds 

注意第一行包含列標題。

更新:如果您有最新版本的SAS(9.3 TS1M1或更高版本),則可以創建Office 2010 Excel電子表格,該表格最多可包含1,048,576行16,384列。在這種情況下,您可以使用DBMS=XLSX

+0

這對我不起作用,因爲我已經不得不使用DBMS = EXCELCS來處理sas 9.4/64體系結構。 http://blogs.sas.com/content/sasdummy/2012/05/01/64-bit-gotchas/ –

1

如果你對XLSX或CSV好,Bob的回答很好。如果你想製作一個.xls excel文件(255列限制),或者沒有9.3TS1M1,那麼做起來相當容易。具體取決於你想如何指定進入每個文件的列。假設您只希望每個255列成爲一個單獨的文件,並且在中點分割兩個文件(35000記錄到文件A,35001-結束到文件B,每個變量集合)。你會做這樣的事情:當我嘗試導出sheet4,不知道是否有一些限制.xls文件的總大小

options mprint symbolgen; 
data test; 
array xs x1-x1700; 
do id = 1 to 70000; 
do _t = 1 to dim(xs); 
    xs[_t]=ranuni(7); 
end; 
output; 
end; 
run; 

%macro export_file(varstart=,varend=,varnumstart=0,varnumend=0,recstart=1,recend=0,keeplist=,dset=, libname=WORK, outfile=,sheet="sheet1"); 
%if &varnumstart ne 0 %then %do; 
    proc sql noprint; 
    select name into :varstart from dictionary.columns 
    where libname=upcase("&libname.") and memname=upcase("&dset.") and varnum=&varnumstart.;  
    select name into :varend from dictionary.columns 
    where libname=upcase("&libname.") and memname=upcase("&dset.") and varnum=&varnumend.; 
    quit; 
%end; 
%if &varstart=%str() or &varend=%str() %then %do; 
    %put "ERROR: MISSING PARAMETERS. PLEASE CHECK YOUR MACRO CALL AND RERUN. MUST HAVE VARSTART AND VAREND OR VARNUMSTART AND VARNUMEND."; 
    %abort; 
%end; 

data _for_Export/view=_for_export; 
set &libname..&dset; 
keep &varstart.--&varend. 
%if &keeplist ne %str() %then %do; 
&keeplist 
%end; 
; 
if _N_ ge &recstart.; 
%if &recend ne 0 %then %do; 
if _N_ le &recend.; 
%end; 
run; 

proc export data=_for_export file=&outfile. dbms=excel replace; 
sheet=&sheet.; 
run; 

proc datasets nolist noprint lib=work; 
delete _for_export/memtype=view; 
quit; 
%mend export_file; 
%export_file(varnumstart=1,varnumend=250, keeplist=id,recstart=1,recend=35000,dset=test,outfile="c:\temp\test.xls",sheet="sheet1"); 
%export_file(varnumstart=1,varnumend=250, keeplist=id,recstart=35001,recend=99999,dset=test,outfile="c:\temp\test.xls",sheet="sheet2"); 
%export_file(varnumstart=251,varnumend=500, keeplist=id,recstart=1,recend=35000,dset=test,outfile="c:\temp\test.xls",sheet="sheet3"); 
%export_file(varnumstart=251,varnumend=500, keeplist=id,recstart=35001,recend=99999,dset=test,outfile="c:\temp\test.xls",sheet="sheet4"); 

礦失敗,但你可以很容易地修改此創建單獨的文件。如果您需要爲每個單獨的文件指定不連續的特定變量名稱,這將不起作用,但是您可以很容易地修改從dictionary.columns中提取的SQL代碼,而不是從您創建的包含變量名稱的表中拉出想要在每個文件中。