I的值有一個數據我稱之爲sam.data如下:imputting在R和STATA
dput(sam.data)
structure(list(idn = c(1L, 2L, 3L, 4L, 5L, 6L, 66L, 62L, 7L,
81L, 68L, 72L), n1 = c(1L, 2L, 3L, 4L, 5L, 6L, 6L, 6L, 7L, 7L,
7L, 7L), x = c(9.95228, 11.4186, 10.3735, 10.5453, 10.7364, 9.85219,
9.73307, 9.86304, 9.74097, 9.57359, 9.70899, 9.75185)), .Names = c("idn",
"n1", "x"), row.names = c(NA, 12L), class = "data.frame")
sam.data
idn n1 x
1 1 1 9.95228
2 2 2 11.41860
3 3 3 10.37350
4 4 4 10.54530
5 5 5 10.73640
6 6 6 9.85219
7 66 6 9.73307
8 62 6 9.86304
9 7 7 9.74097
10 81 7 9.57359
11 68 7 9.70899
12 72 7 9.75185
對於idn
不等於n1
,創建一個新的變量y
這需要的x
對應的值到n1
,否則我將它分配爲缺失。我能夠在R
中生成一個緊密的解決方案。不過,我寧願在R
有優雅的解決方案。另外,我還在「Stata
」中尋找解決方案。
My solution in R:
library(plyr)
sam.data2<-ddply(sam.data,.(n1),transform, y=x[which.min(idn)])
sam.data2
sam.data2
idn n1 x y
1 1 1 9.95228 9.95228
2 2 2 11.41860 11.41860
3 3 3 10.37350 10.37350
4 4 4 10.54530 10.54530
5 5 5 10.73640 10.73640
6 6 6 9.85219 9.85219
7 66 6 9.73307 9.85219
8 62 6 9.86304 9.85219
9 7 7 9.74097 9.74097
10 81 7 9.57359 9.74097
11 68 7 9.70899 9.74097
12 72 7 9.75185 9.74097
Expected output:
idn n1 x y
1 1 1 9.95228
2 2 2 11.41860
3 3 3 10.37350
4 4 4 10.54530
5 5 5 10.73640
6 6 6 9.85219
7 66 6 9.73307 9.85219
8 62 6 9.86304 9.85219
9 7 7 9.74097
10 81 7 9.57359 9.74097
11 68 7 9.70899 9.74097
12 72 7 9.75185 9.74097
感謝「R」解決方案的研究。我更喜歡使用NA,因爲我希望列是數字。 – Metrics