2016-07-28 24 views
0

我有這個結構的數據集:最後一次觀察結轉條件上多列

ID = c(1,1,1,1,2,2,2,3,3,3,3) 
L40 = c(1, NA, NA, NA, 1, NA, NA, NA, 1, NA, NA) 
K50 = c(NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, 1) 
df = data.frame(ID, L40, K50) 

當缺失值出現在列L40和K50,我想在此列發揚過去的非缺失值,條件是ID與先前的ID相同,並且當前行中的L40和K50中的值爲空。我申請了下面的代碼:

library(tidyr) 
df2 <- df %>% group_by(ID) %>% fill(L40:K50) 

這並沒有達到我要找的。只有當該行中的其他列(ID除外)爲空時,我才希望先前的非缺失值被帶入下一行。這就是我想要的:

ID = c(1,1,1,1,2,2,2,3,3,3,3) 
L40 = c(1, 1, 1, 1, 1, NA, NA, NA, 1, 1, NA) 
K50 = c(NA, NA, NA, NA, NA, 1, 1, NA, NA, NA, 1) 
df3 = data.frame(ID, L40, K50) 

回答

0

我們可以使用na.locf

library(data.table) 
library(zoo) 
setDT(df)[, if(any(is.na(K50[-1]))) lapply(.SD, na.locf) else .SD , by = ID] 
# ID L40 K50 
#1: 1 1 NA 
#2: 1 1 NA 
#3: 1 1 NA 
#4: 1 1 NA 
#5: 2 1 NA 
#6: 2 NA 1 
#7: 3 NA 1 
#8: 3 NA 1 
#9: 3 NA 1 

使用dplyr的選擇是

library(dplyr) 
df %>% 
    mutate(ind = rowSums(is.na(.))) %>% 
    group_by(ID) %>% 
    mutate_each(funs(if(any(ind>1)) na.locf(., na.rm=FALSE) else .), L40:K50) %>% 
    select(-ind) 
#  ID L40 K50 
# <dbl> <dbl> <dbl> 
#1  1  1 NA 
#2  1  1 NA 
#3  1  1 NA 
#4  1  1 NA 
#5  2  1 NA 
#6  2 NA  1 
#7  3 NA  1 
#8  3 NA  1 
#9  3 NA  1 
+0

不,這產生了我想要避免的完全相同的結果。我不希望第5排的L40的價值進入第6排。 – udden2903

+0

@ udden2903 id 2與id 3有什麼不同?如ID = 2,ID爲 – akrun

+0

ID 2。對此感到抱歉。 – udden2903

0

我這個問題上發揮了一段時間,並且由於我對RI有限的瞭解,提出了以下解決方法。我添加了一個日期列原始數據幀作說明之用:

ID = c(1,1,1,1,2,2,2,3,3,3,3) 
date = c(1,2,3,4,1,2,3,1,2,3,4) 
L40 = c(1, 1, NA, NA, 1, NA, NA, NA, 1, NA, NA) 
K50 = c(NA, 1, 1, NA, NA, 1, NA, NA, NA, NA, 1) 
df = data.frame(ID, date, L40, K50) 

這裏是我做過什麼:

#gather the diagnosis columns in rows and keep only those rows where the patient has the associated diagnosis. 
df1 <- df %>% gather(diagnos, dummy, L40:K50) %>% filter(dummy==1) %>% arrange(ID, date) 

#concatenate across rows by ID and date to collect all diagnoses of an ID at a particular date. 
df2 <- df1 %>% group_by(ID, date) %>% mutate(diag = paste(diagnos, collapse=" ")) %>% select(-diagnos, -dummy) 

#convert into data tables in preparation for join 
Dt1 <- data.table(df) 
Dt2 <- data.table(df2) 

setkey(Dt1, ID, date) 
setkey(Dt2, ID, date) 

#Each observation in Dt1 is matched with the observation in Dt1 with the same date or, if that particular date is not present, 
#by the nearest previous date: 
final <- Dt2[Dt1, roll=TRUE] %>% distinct() 

這發揚診斷的姓名(或名稱),直到下一個觀察診斷。

相關問題