2015-02-09 93 views
0

我有一個關於學生在一所院校教育信息的數據集。 我想知道他們學習的課程有多少種不同的組合。我掌握了碩士和學士兩級的信息,我想統計每個教育級別(碩士,學士)不同學習計劃的數量。sas計數不同組合的數量

例如PERSON1可以有:

Bachelor: 
- study1 
- study2 
- study3 
- study3 

Master: 
- studyA 
- studyA 

後來我想了一些在本科水平3個研究項目(研究3不應該算作兩次),並在大師級別了一些1。 每個學習程序都有自己的行 - 所以在數據集中person1有6行。 我想每人一行告訴每個教育水平的研究項目的數量:

person number_bachelor  number_master 
person1 3     1 
....etc... 

我已經試過這一點:

proc sql; 
create table new as 
select distinct personid, name, 
count(study) as number_of_bach 
from old 
group by personid, edu_level, study; 
quit; 

但它不給我我想要的。 這給了我兩行person1,變量「number_of_bach」中的值爲1和2。

如何編輯此代碼以獲得我想要的結果?

回答

0

這是你想要的嗎?

DATA old; 
    INPUT personid edu_level $ study $; 
    DATALINES; 
1 bachelor study1 
1 bachelor study2 
1 bachelor study3 
1 bachelor study3 
1 master studyA 
1 master studyA 
1 master studyB 
; 

PROC SQL; 
    CREATE TABLE new AS 
    SELECT personid, edu_level, COUNT (DISTINCT study) AS num_bach 
    FROM OLD 
    GROUP BY personid, edu_level; 
QUIT; 

study是所謂的聚合列在您的查詢(因爲COUNT是一個聚合函數)和(因此不應該被包含在GROUP BY -clause否則你的查詢也將groupy「研究」和計數始終爲1

如果你想有一個每個人在一行上再增加一個PROC TRANSPOSE

PROC transpose IN = new OUT = new2; 
    BY personid; 
    ID edu_level; 
RUN; 

(您也創造出更復雜的查詢ing子查詢和連接,而不是轉置,只要你沒有數百萬行TRANSPOSE的開銷並不重要)

爲了完整起見,這裏是一個只針對你的問題的SQL解決方案:

PROC SQL; 
    CREATE TABLE new AS 
    SELECT p.personid, b.num_bachelors, m.num_masters 
      /* Select unique personids */ 
      FROM (SELECT DISTINCT personid 
        FROM old) AS p 
      /* Count number of bachelor-level courses */ 
      LEFT JOIN (SELECT personid, 
           COUNT(DISTINCT study) AS num_bachelors 
         FROM old WHERE edu_level = 'bachelor' 
         GROUP BY personid) AS b on p.personid = b.personid 
      /* Count number of master-level courses */ 
      LEFT JOIN (SELECT personid, 
           COUNT(DISTINCT study) AS num_masters 
         FROM old WHERE edu_level = 'master' 
         GROUP BY personid) AS m on p.personid = m.personid; 

QUIT; 
+0

謝謝你這麼多的幫助。這正是我想要的。 – user1626092 2015-02-11 10:56:09

2

代碼:

data education; 
input person $ level $ program $; 
datalines; 
person1 bachelor study1 
person1 bachelor study2 
person1 bachelor study3 
person1 bachelor study3 
person1 master study1 
person2 bachelor study1 
person2 master study2 
person2 master study1 
; 
run; 

proc sort data = education nodupkey; 
by person level program; 
run; 

proc sql; 
select person, 
sum(case when level eq 'bachelor' then 1 else 0 end) as num_bachelors, 
sum(case when level eq 'bachelor' then 1 else 0 end) as num_masters 
from education 
group by person; 
quit; 

工作:在這裏,SORT過程將消除重複的記錄,如果有的話。然後,SQL程序只能用於生成本科級程序的人員數量和主級程序的數量。

輸出:

person num_bachelors num_masters 
person1    3    1 
person2    1    2