所以我有三個數據集需要合併。這些包含4年級和5年級的學校數據和閱讀/數學成績。其中一個是一個長形式的數據集,在一些變量中有很多缺失(是的,我確實需要長數據),另外兩個全面缺失的數據。所有這些數據框都包含一個列,該列具有數據庫中每個人的唯一ID號。合併具有廣泛形式完整數據的NA的長型數據以覆蓋NA
這裏是產生我與......我需要使用有以下三個數據幀的工作類型data.frames的一個小例子,一個完整的可重複的例子:school_lf
,school4
和school5
。 school_lf
與港定居人士及school4
和school5
長表數據是我需要使用填充在這漫長的表單數據的NA的(由id
和grade
)
set.seed(890)
school <- NULL
school$id <-sample(102938:999999, 100)
school$selected <-sample(0:1, 100, replace = T)
school$math4 <- sample(400:500, 100)
school$math5 <- sample(400:500, 100)
school$read4 <- sample(400:500, 100)
school$read5 <- sample(400:500, 100)
school <- as.data.frame(school)
# Delete observations at random from the school df
indm4 <- which(school$math4 %in% sample(school$math4, 25))
school$math4[indm4] <- NA
indm5 <- which(school$math5 %in% sample(school$math5, 50))
school$math5[indm5] <- NA
indr4 <- which(school$read4 %in% sample(school$read4, 70))
school$read4[indr4] <- NA
indr5 <- which(school$read5 %in% sample(school$read5, 81))
school$read5[indr5] <- NA
# Separate Read and Math
read <- as.data.frame(subset(school, select = -c(math4, math5)))
math <- as.data.frame(subset(school, select = -c(read4, read5)))
# Now turn this into long form data...
clr <- melt(read, id.vars = c("id", "selected"), variable.name = "variable", value.name = "readscore")
clm <- melt(math, id.vars = c("id", "selected"), value.name = "mathscore")
# Clean up the grades for each of these...
clr$grade <- ifelse(clr$variable == "read4", 4,
ifelse(clr$variable == "read5", 5, NA))
clm$grade <- ifelse(clm$variable == "math4", 4,
ifelse(clm$variable == "math5", 5, NA))
# Put all these in one df
school_lf <-cbind(clm, clr$readscore)
school_lf$readscore <- school_lf$`clr$readscore` # renames
school_lf$`clr$readscore` <- NULL # deletes
school_lf$variable <- NULL # deletes
###############
# Generate the 2 data frames with IDs that have the full data
set.seed(890)
school4 <- NULL
school4$id <-sample(102938:999999, 100)
school4$selected <-sample(0:1, 100, replace = T)
school4$math4 <- sample(400:500, 100)
school4$read4 <- sample(400:500, 100)
school4$grade <- 4
school4 <- as.data.frame(school4)
set.seed(890)
school5 <- NULL
school5$id <-sample(102938:999999, 100)
school5$selected <-sample(0:1, 100, replace = T)
school5$math5 <- sample(400:500, 100)
school5$read5 <- sample(400:500, 100)
school5$grade <- 5
school5 <- as.data.frame(school5)
我需要合併寬表單數據的DFS轉換爲長格式的數據以用實際值替換NA。我已經嘗試了下面的代碼,但它引入了幾個列,而不是將讀取分數和數學分數合併到NA中。我只需要一列閱讀分數和一列數學分數,而不是六個單獨列(read.x
,read.y
,math.x
,math.y
,和readscore
)。
sch <- merge(school_lf, school4, by = c("id", "grade", "selected"), all = T)
sch <- merge(sch, school5, by = c("id", "grade", "selected"), all = T)
任何幫助,不勝感激!我一直試圖解決這個問題,現在已經好幾個小時了,還沒有取得任何進展(所以我想問一下)
感謝您的答覆!該功能看起來整潔,但我無法理解它在做什麼......我試圖與樣本數據運行它,它給了我這個錯誤:'mutate_impl錯誤(.data,dots):object'math4'not found' – rowbust
然後你可能在不同的數據集上運行它,它只是說math4不是不在de數據集中,而是應用函數上。 – Edwin