2017-02-23 89 views
1

我有類似於下面的簡化表中的數據集的「級別」的變量(姑且稱之爲「DS_have」):創建可算上其他變量

SurveyID Participant FavoriteColor FavoriteFood SurveyMonth 
S101  G92   Blue   Pizza   Jan 
S102  B34   Blue   Cake   Feb 
S103  Z28   Green   Cake   Feb 
S104  V11   Red    Cake   Feb 
S105  P03   Yellow   Pizza   Mar 
S106  A71   Red    Pizza   Mar 
S107  C48   Green   Cake   Mar 
S108  G92   Blue   Cake   Apr 
... 

我想創建一組數字變量,用於識別上述數據集中每個變量的離散類別/級別。結果應該像下面的數據集(「DS_want」):

SurveyID Participant FavoriteColor FavoriteFood SurveyMonth ColorLevels FoodLevels ParticipantLevels MonthLevels 
S101  G92  Blue    Pizza   Jan     1   1     1    1 
S102  B34  Blue    Cake   Feb     1   2     2    2 
S103  Z28  Green   Cake   Feb     2   2     3    2 
S104  V11  Red    Cake   Feb     3   2     4    2 
S105  P03  Yellow   Pizza   Mar     4   1     5    3 
S106  A71  Red    Pizza   Mar     3   1     6    3 
S107  C48  Green   Cake   Mar     2   2     7    3 
S108  G92  Blue    Cake   Apr     1   1     1    4 
... 

從本質上講,我想知道我應該在DS_Have用什麼語法生成唯一的數值爲每個變量的「級別」或類別數據集。請注意,我不能使用有條件的if/then語句在每個類別的「:Levels」變量中創建值,因爲某些變量的級別數量爲數千。

+0

如果可以/我用'PROC IML'編輯與標記爲可能更容易比Base SAS ... – Joe

回答

2

一個簡單的解決方案是使用proc tabulate生成一個列表列表,然後遍歷它並創建信息將文本轉換爲數字;那麼你只需使用input來編碼它們。

*store variables you want to work with in a macro variable to make this easier; 
%let vars=FavoriteColor FavoriteFood SurveyMonth; 

*run a tabulate to get the unique values; 
proc tabulate data=have out=freqs; 
    class &vars.; 
    tables (&vars.),n; 
run; 

*if you prefer to have this in a particular order, sort by that now - otherwise you may have odd results (as this will). Sort by _TYPE_ then your desired order.; 


*Now create a dataset to read in for informat.; 
data for_fmt; 
    if 0 then set freqs; 
    array vars &vars.; 
    retain type 'i'; 
    do label = 1 by 1 until (last._type_); *for each _type_, start with 1 and increment by 1; 
    set freqs; 
    by _type_ notsorted; 
    which_var = find(_type_,'1'); *parses the '100' value from TYPE to see which variable this row is doing something to. May not work if many variables - need another solution to identify which (depends on your data what works); 

    start = coalescec(vars[which_var]); 
    fmtname = cats(vname(vars[which_var]),'I'); 
    output; 
    if first._type_ then do; *set up what to do if you encounter a new value not coded - set it to missing; 
     hlo='o'; *this means OTHER; 
     start=' '; 
     label=.; 
     output; 
     hlo=' '; 
     label=1; 
    end; 
    end; 
run; 

proc format cntlin=for_fmt; *import to format catalog via PROC FORMAT; 
quit; 

那麼這樣的代碼它們(您可以創建一個宏來完成這個循環在&瓦爾宏變量)。

data want; 
    set have; 
    color_code = input(FavoriteColor,FavoriteColorI.); 
run; 
+1

順便說一句,我覺得還有一個更直接的答案不是這個使用了PROC這個目的 - 我不記得那是什麼,也許瑞克或數據_Null_會停下來,並有答案。 – Joe

+1

我最初也是這麼想的,但通常編寫0/1來創建虛擬變量,這有點不同。雖然使用proc方式或可能會有一種更快的方式,它會自動創建訂單變量並將它們傳遞給PROC FORMAT。 – Reeza

+1

@Reeza我發誓我已經看到'data_null_'中的一些東西來做到這一點,直接處理 - 或者'proc means'中的'idgroup'的某種巧妙使用或者其他處理。但我不記得是什麼。 – Joe

0

另一種方法 - 創建哈希對象來跟蹤每個變量所遇到的水平,並通過雙DOW循環讀取數據集兩次,在第二次施加的電平數字。這可能不像喬的解決方案那樣優雅,但它應該使用更少的內存,我懷疑它會擴展到更多的變量。

%macro levels_rename(DATA,OUT,VARS,NEWVARS); 
    %local i NUMVARS VARNAME; 

    data &OUT; 
    if 0 then set &DATA; 
    length LEVEL 8; 
    %let i = 1; 
    %let VARNAME = %scan(&VARS,&i); 
    %do %while(&VARNAME ne); 
     declare hash h&i(); 
     rc = h&i..definekey("&VARNAME"); 
     rc = h&i..definedata("LEVEL"); 
     rc = h&i..definedone(); 
     %let i = %eval(&i + 1); 
     %let VARNAME = %scan(&VARS,&i); 
    %end; 
    %let NUMVARS = %eval(&i - 1); 
    do _n_ = 1 by 1 until(eof); 
     set &DATA end = eof; 
     %do i = 1 %to &NUMVARS; 
     LEVEL = h&i..num_items + 1; 
     rc = h&i..add(); 
     %end; 
    end; 
    do _n_ = 1 to _n_; 
     set &DATA; 
     %do i = 1 %to &NUMVARS; 
     rc = h&i..find(); 
     %scan(&NEWVARS,&i) = LEVEL; 
     %end; 
     output; 
    end; 
    drop LEVEL; 
    run; 
%mend; 

%levels_rename(sashelp.class,class_renamed,NAME SEX, NAME_L SEX_L); 
相關問題