在SAS中刪除重複記錄

我有一張將被加載到oracle數據庫中的表。我需要刪除重複值而不改變數據的順序。每組有5個可能的記錄。 1.空行需要刪除。 2.重複名稱需要被刪除，所以只出現不同的名稱。 3.數據不能重新排序。在SAS中刪除重複記錄

1 Commingled Data 
2 Social Security 
3 
4 
5 SSA 1996 
1 Commingled Data 
2 Social Security 
3 
4 
5 SSA 1997 
1 Commingled Data 
2 Social Security 
3 
4 
5 SSA -1998 
1 Commingled Data 
2 Statistical Administrative 
3 
4 
5 StARS 2000 
1 Federal 
2 Treasury 
3 Internal 
4 1099 
5 Master File - TY 1997 (1099/IRMF) 
1 Federal 
2 Treasury 
3 Internal 
4 1099 
5 Master File - TY 1998 (1099/IRMF) 
1 State 
2 Wage 
3 Indiana 
4 
5 Indiana - 1990Q1-2005Q2 
1 Federal 
2 Treasury 
3 Internal 
4 1040 
5 TY 2003 (1040/IMF) 1% File 
1 Federal 
2 Treasury 
3 Internal 
4 1040 
5 TY 2003 (1040/IMF) Cycles 1-39

來源

2016-12-13 user601828

什麼是「重複行」？你的輸出是什麼？你在想什麼？ – Joe

可重複的行通常是第1行，第1行是混合數據，聯邦也重複第2行也重複，有時第3行也是..我試圖first.last函數，也coalescec和selfjoin並與偏移量。 – user601828

你可以：第一步=>做一個選擇不同的只選擇不同的變量，然後第二步=>刪除一個行，如果所有的變量都失蹤？ –

這是HASH對象的一個很好的用例。如果您使用multidata:'n'和ref方法，它將檢查記錄是否已經存在於散列表中，如果不存在，則添加它 - 但不添加重複項。

這裏我添加rownum以便能夠返回到原來的排序順序，因爲散列表是二叉樹，並且沒有自然順序，除非您施加它。

data have; 
input @1 line $50.; 
datalines; 
1 Commingled Data 
2 Social Security 
3 
4 
5 SSA 1996 
1 Commingled Data 
2 Social Security 
3 
4 
5 SSA 1997 
1 Commingled Data 
2 Social Security 
3 
4 
5 SSA -1998 
1 Commingled Data 
2 Statistical Administrative 
3 
4 
5 StARS 2000 
1 Federal 
2 Treasury 
3 Internal 
4 1099 
5 Master File - TY 1997 (1099/IRMF) 
1 Federal 
2 Treasury 
3 Internal 
4 1099 
5 Master File - TY 1998 (1099/IRMF) 
1 State 
2 Wage 
3 Indiana 
4 
5 Indiana - 1990Q1-2005Q2 
1 Federal 
2 Treasury 
3 Internal 
4 1040 
5 TY 2003 (1040/IMF) 1% File 
1 Federal 
2 Treasury 
3 Internal 
4 1040 
5 TY 2003 (1040/IMF) Cycles 1-39 
;;;; 
run; 

data _null_; 
    set have end=eof; 
    rownum = _n_; 
    if _n_=1 then do; 
    declare hash h(ordered:'n', multidata:'n'); 
    h.defineKey('line'); 
    h.defineData('line', 'rownum'); 
    h.defineDone(); 
    end; 
    if not missing(substr(line,3)) then rc = h.ref(); 
    if eof then do; 
    h.output(dataset:'want'); 
    end; 
run; 

proc sort data=want; 
    by rownum; 
run;

來源

2016-12-13 19:43:56 Joe

喬，謝謝！沒有人在工作可以解決這個問題。你太棒了。再次感謝 – user601828

在SAS中刪除重複記錄

回答

相關問題