使用indsname計算變量頻率

我在多個數據集中有一個共同的列ID。現在我正在查找某些ID在這些數據集中出現的次數。即使在一個數據集中多次出現ID，數據集中的visit仍被計爲1。使用indsname計算變量頻率

因此給出m數據集，對於任何ID，值visit在1和m之間。

理想輸出：

ID # Visit 
222 5 
233 5 
556 3 
... 
... 
667 1

數據集：（他們沒有相同的前綴，這是一個例子）。

數據1：（＃的222訪問是1，甚至出現兩次）

ID col2 col3 ... 
21 
222 
222

...

數據5：（＃的222訪問是1）

ID col87 col12 ... 
222 
623 
126

我不知道如何從這開始。它好像是一本詞典遍歷。

來源

2015-07-03 Lovnlust

這不是測試，但沿線的東西應該工作：

/*Stack up all of your tables, keep 'ID' only*/ 
data have (keep=ID); 
set data: indsname=dsn; 
dsname = dsn; 
run; 

/*Proc SQL to get the job done*/ 
Proc sql; 
create table want as 
select ID, count(distinct dsname) as visit from have 
group by ID 
; 
quit;

來源

2015-07-03 02:16:53

這裏有一個數據步驟方法，假設所有數據集被編號的順序排列。

基本上，你比較當前行到上一行的indsname值，如果它改變，增量次數由1.注意比平時的做法略有不同，以重新計數 - 我做後indsname檢查，並1，之前不爲0;這是因爲在某些情況下，您可能會有兩個具有不同indsname值的連續ID。由於我們不能在BY語句中使用INDSNAME，所以我們不能依賴SAS爲我們「改變」值（就像我們使用嵌套的語句一樣），所以我們必須按順序執行它。

data data1; 
    do id = 1 to 20; 
    output; 
    end; 
run; 

data data2; 
    do id = 1 to 30 by 2; 
    output; 
    end; 
run; 

data data3; 
    do id = 1 to 30 by 3; 
    output; 
    end; 
run; 

data want; 
    set data: indsname=dsn; 
    by id; 
    count+ifn(dsn=lag(dsn),0,1); 
    if first.id then count=1; 
    if last.id then output; 
run;

來源

2015-07-03 14:14:42 Joe

由於0或1中的比較結果，不需要IFN（）函數。count + dsn ne lag（dsn）; – Tom

使用indsname計算變量頻率

回答

相關問題