通過重複值填充SAS變量

我有一個SAS表，其中有很多缺失值。這只是一個簡單的例子。真正的表格要大得多（> 1000行），數字也不一樣。但同樣的是，我有一個沒有缺失數字的專欄。列b和c的序列比a的長度短。通過重複值填充SAS變量

我想是填補b。將c。與重複序列，直到他們列滿。結果應該是這樣的：

我試圖做一個宏，但它變得凌亂。

來源

2017-06-23 fossekall

在同一數據集所有這些是最初？看起來像一個不好的合併？ – Reeza

是的，它實際上是一個「壞」合併。但是這不會影響目標。我想知道如何從矢量a，b和c得到這張決賽桌。 – fossekall

我問這個問題的原因是，如果你已經有了單獨的數據集中的數據，它可以更容易地加載到臨時數組或設置一個SQL步驟。首先「邁出」一步可能會使這個問題更容易處理。您已經有解決方案，因此您可以隨意忽略這一點，因爲您已經實現了「目標」。 – Reeza

哈希的哈希值的解決方案是最靈活的在這裏，我懷疑。

data have; 
infile datalines delimiter="|"; 
input a b $ c; 
datalines; 
1|1b|1000 
2|2b|2000 
3|3b|  
4| |  
5| |  
6| |  
7| |  
;;;; 
run; 


%let vars=b c; 

data want; 
    set have; 
    rownum = _n_; 
    if _n_=1 then do; 
    declare hash hoh(ordered:'a'); 
    declare hiter hih('hoh'); 
    hoh.defineKey('varname'); 
    hoh.defineData('varname','hh'); 
    hoh.defineDone(); 

    declare hash hh(); 

    do varnum = 1 to countw("&vars."); 
     varname = scan("&vars",varnum); 
     hh = _new_ hash(ordered:'a'); 
     hh.defineKey("rownum"); 
     hh.defineData(varname); 
     hh.defineDone(); 
     hoh.replace(); 
    end; 
    end; 

    do rc=hih.next() by 0 while (rc=0); 
    if strip(vvaluex(varname)) in (" ",".") then do; 
     num_items = hh.num_items; 
     rowmod = mod(_n_-1,num_items)+1; 
     hh.find(key:rowmod); 
    end; 
    else do; 
     hh.replace(); 
    end; 
    rc = hih.next(); 
    end; 
    keep a &Vars.; 
run;

基本上，一個散列是爲您正在使用的每個變量而構建的。它們都被添加到哈希散列。然後我們遍歷它，並搜索以查看所請求的變量是否已填充。如果是，那麼我們將它添加到它的散列。如果不是，那麼我們檢索合適的一個。

來源

2017-06-23 20:53:13 Joe

用值填充一個臨時數組，然後檢查該行並添加適當的值。

設置數據

data have; 
infile datalines delimiter="|"; 
input a b $ c; 
datalines; 
1|1b|1000 
2|2b|2000 
3|3b|  
4| |  
5| |  
6| |  
7| |  
;

獲取非空值

proc sql noprint; 
select count(*) 
    into :n_b 
    from have 
    where b ^= ""; 

select count(*) 
    into :n_c 
    from have 
    where c ^=.; 
quit;

的計數現在通過重複每個陣列的內容填充缺失的數值。

data want; 
set have; 
/*Temporary Arrays*/ 
array bvals[&n_b] $ 32 _temporary_; 
array cvals[&n_c] _temporary_; 

if _n_ <= &n_b then do; 
    /*Populate the b array*/ 
    bvals[_n_] = b; 
end; 
else do; 
    /*Fill the missing values*/ 
    b = bvals[mod(_n_+&n_b-1,&n_b)+1]; 
end; 

if _n_ <= &n_c then do; 
    /*populate C values array*/ 
    cvals[_n_] = c; 
end; 
else do; 
    /*fill in the missing C values*/ 
    c = cvals[mod(_n_+&n_c-1,&n_c)+1]; 
end; 
run;

來源

2017-06-23 16:52:13 DomPazz

這看起來不錯。我必須檢查它。似乎解決了我的問題 – fossekall

data want; 
    set have; 
    n=mod(_n_,3); 
    if n=0 then b='3b'; 
    else b=cats(n,'b'); 
    if n in (1,0) then c=1000; 
    else c=2000; 
    drop n; 
run;

來源

2017-06-23 16:58:59

這工作，但不是一般的。在我真正的問題中，我確實有30個變量。 – fossekall

假設你能告訴多少行每個變量的使用計數非遺漏值有多少列，那麼你可以使用此代碼生成技術來生成將使用POINT =選項設置數據的步驟語句循環遍歷變量X的第一個Nx觀察值。

首先獲取變量名稱列表;

proc transpose data=have(obs=0) out=names ; 
    var _all_; 
run;

然後使用它們來生成PROC SQL select語句來計算每個變量的非缺失值的數量。

filename code temp ; 
data _null_; 
    set names end=eof ; 
    file code ; 
    if _n_=1 then put 'create table counts as select ' ; 
    else put ',' @; 
    put 'sum(not missing(' _name_ ')) as ' _name_ ; 
    if eof then put 'from have;' ; 
run; 

proc sql noprint; 
%include code /source2 ; 
quit;

然後轉置，這樣你再有每個變量名一行但這次它也有COL1計數。

proc transpose data=counts out=names ; 
    var _all_; 
run;

現在使用它來生成DATA步驟所需的SET語句以從輸入創建輸出。

filename code temp; 
data _null_; 
    set names ; 
    file code ; 
    length pvar $32 ; 
    pvar = cats('_point',_n_); 
    put pvar '=mod(_n_-1,' col1 ')+1;' ; 
    put 'set have(keep=' _name_ ') point=' pvar ';' ; 
run;

現在使用生成的語句。

data want ; 
    set have(drop=_all_); 
    %include code/source2; 
run;

所以與變量A，B和C和7個總觀測日誌中產生的數據步的示例數據文件是這樣的：

1229 data want ; 
1230 set have(drop=_all_); 
1231 %include code/source2; 
NOTE: %INCLUDE (level 1) file CODE is file .../#LN00026. 
1232 +_point1 =mod(_n_-1,7)+1; 
1233 +set have(keep=a) point=_point1 ; 
1234 +_point2 =mod(_n_-1,3)+1; 
1235 +set have(keep=b) point=_point2 ; 
1236 +_point3 =mod(_n_-1,2)+1; 
1237 +set have(keep=c) point=_point3 ; 
NOTE: %INCLUDE (level 1) ending. 
1238 run; 

NOTE: There were 7 observations read from the data set WORK.HAVE. 
NOTE: The data set WORK.WANT has 7 observations and 3 variables.

來源

2017-06-23 22:24:57 Tom

通過重複值填充SAS變量

回答

相關問題