2013-01-24 80 views
1

我想在sas中使用proc sql來確定案例或記錄​​是否缺少某些信息。我有兩個數據集。一個是整個數據收集的記錄,顯示訪問期間收集了哪些表單。第二種是在訪問期間收集什麼樣的的規範。我已經嘗試了許多方案,包括數據的步驟和使用not in無濟於事SQL代碼...如何在sas中使用proc sql找到丟失的案例?

示例數據低於


***** dataset crf is a listing of all forms that have been filled out at each visit ; 
***** cid is an identifier for a study center ; 
***** pid is an identifier for a participant ; 

data crf; 
    input visit cid pid form ; 
cards; 
1 10 101 10 
1 10 101 11 
1 10 101 12 
1 10 102 10 
1 10 102 11 
2 10 101 11 
2 10 101 13 
2 10 102 11 
2 10 102 12 
2 10 102 13 
; 
run; 


***** dataset crfrule is a listing of all forms that should be filled out at each visit ; 
***** so, visit 1 needs to have forms 10, 11, and 12 filled out ; 
***** likewise, visit 2 needs to have forms 11 - 14 filled out ; 

data crfrule; 
    input visit form ; 
cards; 
1 10 
1 11 
1 12 
2 11 
2 12 
2 13 
2 14 
; 
run; 


***** We can see from the two tables that participant 101 has a complete set of records for visit 1 ; 
***** However, participant 102 is missing form 12 for visit 1 ; 
***** For visit 2, 101 is missing forms 12 and 14, whereas 102 is missing form 14 ; 


***** I want to be able to know which forms were **NOT** filled out by each person at each visit (i.e., which forms are missing for each visit) ; 


***** extracting unique cases from crf ; 
proc sql; 
    create table visit_rec as 
    select distinct cid, pid, visit 
     from crf; 
quit; 



***** building the list of expected forms by visit number ; 
proc sql; 
    create table expected as 
    select x.*, 
      y.* 

    from visit_rec as x right join crfrule as y 
     on x.visit = y.visit 

    order by visit, cid, pid, form; 
quit; 


***** so now I have a list of which forms that **SHOULD** have been filled out by each person ; 

***** now, I just need to know if they were filled out or not... ; 

我一直在努力,是要合併expected戰略回到crf表中,其中有一些指標表明每次訪問時缺少哪些表單。

理想情況下,我想產生將有一個表:參觀,CID,PID,missing_form

任何指導,是極大的讚賞。

+0

我已經試過[這個答案](http://stackoverflow.com/questions/8946593/how-can-i-use-proc-sql-to-find-all-the-records的許多版本 - 只存在於一個表 - 但)在我迄今的嘗試。 –

+0

這些都是很好的答案! –

回答

0

您可以使用左連接並使用where子句過濾掉右表中缺少記錄的記錄。

select 
    e.* 
from 
    expected e left join 
    crf c on 
    e.visit = c.visit and 
    e.cid = c.cid and 
    e.pid = c.pid and 
    e.form = c.form 
where c.visit is missing 
; 
2

EXCEPT將做你想做的。我不一定知道這是一般最有效的解決方案(如果您在SAS中執行此操作,幾乎肯定不會),但考慮到您迄今爲止所做的工作,它確實有效:

create table want as 
    select cid,pid,visit,form from expected 
    except select cid,pid,visit,form from crf 
; 

只要小心,除非 - 它非常挑剔(請注意,select *不起作用,因爲您的表格有不同的順序)。

2

我建議一個嵌套的查詢,或者可以分兩步完成。這個怎麼樣:

proc sql; 
    create table temp as 
    select distinct c.* 
     , (d.visit is null and d.form is null and d.pid is null) as missing_form 
    from (
     select distinct a.pid, b.* from 
     crf a, crfrule b 
    ) c 
    left join crf d 
    on  c.pid = d.pid 
     and c.form = d.form 
     and c.visit = d.visit 
    order by c.pid, c.visit, c.form 
    ; 
quit; 

它爲您提供了PID,形式的所有可能的(即預期)組合的列表,請訪問和布爾值,指示是否存在與否。

+0

+1做了輕微的編輯,爲'ORDER BY'列添加別名,但除此之外非常好的答案! – BellevueBob