我試圖連續加入多個數據集並標記來自第一個數據集的觀察結果,這些數據集在隨後的數據集中找不到匹配項。下面是一個例子,我模擬原始數據集加上三個加法加入。目前的代碼做我想要的,但效率非常低。對於大數據集,可能需要幾天時間。是否可以通過應用或其他功能來完成此任務?在R中合併大數據集並標記不匹配
#Toy datasets: x, y, z and w
#dataset X
id <- c(1:10, 1:100)
X1 <- rnorm(110, mean = 0, sd = 1)
year <- c("2004","2005","2006","2001","2002")
year <- rep(year, 22)
month = c("Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar","Apr")
month <- rep(month, 11)
x <- data.frame(id, X1, month, year)
#dataset Y
id2 <- c(1:10, 41:110)
Y1 <- rnorm(80, mean = 0 , sd = 1)
year <- c("2004","2005","2006","2001")
year <- rep(year, 20)
month = c("Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar","Apr")
month <- rep(month, 8)
y <- data.frame(id2,Y1, year,month)
#dataset z
id3 = c(1:60, 401:10000)
Z1 = rpois(9660, 10)
year = c('2004','2005','2006','2002')
year = rep(year, 2415)
month = c("Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar","Apr")
month <- rep(month, 966)
z = data.frame(id3,Z1,year,month)
#dataset w
id4 = c(1:300, 20:29)
W1 = rnorm(310, 20, 36)
year = c('2004','2005','2006','2000','2002')
year = rep(year, 62)
month = c("Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar","Apr")
month <- rep(month, 31)
w = data.frame(id4, W1, year, month)
x$id2 = x$yflag = x$zflag = x$wflag = rep(NA, nrow(x))
y.index = rep(NA, nrow(x))
z.index = rep(NA, nrow(x))
w.index = rep(NA, nrow(x))
for(i in 1:nrow(x)) {
#compare to dataset y, insert yflag == 1 if the same ID, month, year is in x, otherwise 0
y.index = which(as.character(y$id2) == as.character(x$id[i])
& as.character(y$year) == as.character(x$year[i])
& as.character(y$month) == as.character(x$month[i]))
x$yflag[i] = ifelse(length(y.index==1), 1, 0)
x$id2[i] = ifelse(length(y.index) == 1, y$id2[y.index], x$id[i])
## compare to dataset z, insert zflag == 1 if the same ID, month, year is in x, otherwise 0
z.index <- which(as.character(z$id3) == as.character(x$id[i])
& as.character(z$month) == as.character(x$month[i])
& as.character(z$year) == as.character(x$year[i]))
x$zflag[i] <- ifelse(length(z.index == 1), 1, 0)
## compare to dataset w, insert wflag == 1 if the same ID, month, year is in x, otherwise 0
w.index <- which(as.character(w$id4) == as.character(x$id[i])
& as.character(w$month) == as.character(x$month[i])
& as.character(w$year) == as.character(x$year[i]))
x$wflag[i] <- ifelse(length(w.index == 1), 1, 0)
}
print(x)
你試過'合併()'? – Andrie
合併不會正確標記觀察結果,它會拋出信息,因爲我無法看到標誌的等價物。 'test.merge = merge(x,y,by.x ='id',by.y ='id2')'例如並不能解決問題。當然,我可能沒有正確實施它 –
你試過'match()'函數嗎? –