基思:
您的解決方案不能正常工作,看看下面的數據集:
DATA TEMP3;
INPUT VAR2 VAR1 $ DATE;
DUMMY=1;
CARDS;
1 A 1
1 A 2
1 B 3
2 C 4
3 D 5
4 E 6
5 F 7
6 B 8
7 B 9
1 D 10
1 X 11
7 G 14
6 Y 15
6 D 16
6 I 18
8 D 20
9 Z 21
9 X 22
;
RUN;
你的程序的結果是:
VAR2 VAR1 DATE DUMMY idperson
1 A 1 1 1
1 A 2 1 1
1 B 3 1 1
2 C 4 1 2
3 D 5 1 1
4 E 6 1 3
5 F 7 1 4
6 B 8 1 1
7 B 9 1 1
1 D 10 1 1
1 X 11 1 1
7 G 14 1 6
6 Y 15 1 5
6 D 16 1 1
6 I 18 1 5
8 D 20 1 1
9 Z 21 1 7
9 X 22 1 1
哪些是不因爲Var1 = 6條記錄有兩個不同的ID。
這就是我所做的,整個程序(這裏沒有發佈)比較複雜(而且沒那麼優雅),因爲它處理Var1和Var2中缺失的數據。
PROC SQL;
CREATE TABLE WORK.TEMP4 AS SELECT DISTINCT VAR1, VAR2 FROM WORK.TEMP3 WHERE DUMMY=1 AND VAR2^=. ORDER BY VAR1, VAR2;
CREATE TABLE WORK.TEMP5 AS SELECT DISTINCT VAR2, VAR1 FROM WORK.TEMP3 WHERE DUMMY=1 AND VAR2^=. ORDER BY VAR2, VAR1;
CREATE TABLE WORK.TEMP6 AS SELECT TEMP4.*, TEMP5.VAR1 AS CIP2 FROM WORK.TEMP4 INNER JOIN WORK.TEMP5 ON (TEMP4.VAR2=TEMP5.VAR2);
CREATE TABLE WORK.TEMP7 AS SELECT TEMP6.*, TEMP4.VAR2 AS IDHH2 FROM WORK.TEMP6 INNER JOIN WORK.TEMP4 ON (TEMP6.VAR1=TEMP4.VAR1);
CREATE TABLE WORK.TEMP8 AS SELECT DISTINCT IDHH2, CIP2 FROM WORK.TEMP7;
CREATE TABLE WORK.TEMP9 AS SELECT TEMP7.*, TEMP8.CIP2 AS CIP3 FROM WORK.TEMP7 INNER JOIN WORK.TEMP8 ON (TEMP7.IDHH2=TEMP8.IDHH2);
CREATE TABLE WORK.TEMP10 AS SELECT TEMP9.*, TEMP8.IDHH2 AS IDHH3 FROM WORK.TEMP9 INNER JOIN WORK.TEMP8 ON (TEMP9.CIP3=TEMP8.CIP2);
CREATE TABLE WORK.TEMP11 AS SELECT DISTINCT VAR1, IDHH3 AS VAR2 FROM WORK.TEMP10 ORDER BY VAR1, IDHH3;
CREATE TABLE WORK.TEMP12 AS SELECT VAR1, MAX(VAR2) AS VAR2 FROM WORK.TEMP11 GROUP BY VAR1;
CREATE TABLE WORK.TEMP13 AS SELECT TEMP11.* FROM WORK.TEMP11 INNER JOIN WORK.TEMP12 ON (TEMP11.VAR1=TEMP12.VAR1 AND TEMP11.VAR2=TEMP12.VAR2);
CREATE TABLE WORK.TEMP14 AS SELECT TEMP3.*, TEMP13.VAR2 AS IDPERSONA FROM WORK.TEMP3 LEFT JOIN WORK.TEMP13 ON (TEMP3.VAR1=TEMP13.VAR1);
CREATE TABLE WORK.TEMP15 AS SELECT DISTINCT VAR2, IDPERSONA FROM WORK.TEMP14 WHERE VAR2^=. AND IDPERSONA^=.;
CREATE TABLE WORK.TEMP16 AS SELECT TEMP14.*, TEMP15.IDPERSONA AS IDPERSONA2 FROM WORK.TEMP14 LEFT JOIN WORK.TEMP15 ON (TEMP14.VAR2=TEMP15.VAR2) ORDER BY DATE;
QUIT;
DATA TEMP16;
SET TEMP16;
IF IDPERSONA=. THEN IDPERSONA=IDPERSONA2;
DROP IDPERSONA2;
RUN;
而且正確的結果:
VAR2 VAR1 DATE DUMMY IDPERSONA
1 A 1 1 9
1 A 2 1 9
1 B 3 1 9
2 C 4 1 2
3 D 5 1 9
4 E 6 1 4
5 F 7 1 5
6 B 8 1 9
7 B 9 1 9
1 D 10 1 9
1 X 11 1 9
7 G 14 1 9
6 Y 15 1 9
6 D 16 1 9
6 I 18 1 9
8 D 20 1 9
9 Z 21 1 9
9 X 22 1 9
你能對齊你的數據嗎?現在它很難讀。 – 2014-09-24 09:01:20
你不能用直接查詢來做到這一點,因爲任何兩行之間可以有任意數量的「跳躍」。 – Eevee 2014-09-24 09:03:45
這裏存在某種圓形效應..這樣做很難.. 即使您只比較一列,也會使用PSQL – masum7 2014-09-24 09:31:04