查找數據框中跨行的序列之後的元素

我有一個數據集，其結構如下所示。查找數據框中跨行的序列之後的元素

# example data set 

a <- "a" 
b <- "b" 
d <- "d" 

id1 <- c(a,a,a,a,b,b,d,d,a,a,d) 
id2 <- c(b,d,d,d,a,a,a,a,b,b,d) 
id3 <- c(b,d,d,a,a,a,a,d,b,d,d) 

dat <- rbind(id1,id2,id3) 
dat <- data.frame(dat)

我需要重複的元素在每個行找到第一序列「是」，並確定緊隨序列的元素。

# desired results 

dat$s3 <- c("b","b","d") 
dat

我能打破這個問題在3個步驟，解決了第一個，但我的編程技巧是相當有限的，我希望對如何處理步驟2和3。如果你有一個想法有什麼建議以另一種非常有用的方式解決問題。

這是我到目前爲止有：提前

# Step 1: find the first occurence of "a" in the fist sequence 
dat$s1 <- apply(dat, 1, function(x) match(a,x)) 

# Step 2: find the last occurence in the first sequence 

# Step 3: find the element following the last occurence in the first sequence

謝謝！

來源

2016-11-10 ZMacarozzi

你可以嘗試用雙'max.col'來解決問題：簡而言之，'a1 = max.col（dat ==「a」，「first」）'會首先出現「a 「在每一行中。在dat！=「a」中用'a'替換'cbind（rep（seq_along（a1），a1），sequence（a1））'datat'的索引並調用'max.col'應該返回通緝列索引。 –

我會用filter：

fun <- function(x) { 
    x <- as.character(x) 
    isa <- (x == "a") #find "a" values 

    #find sequences with two TRUE values and the last value FALSE 
    ids <- stats::filter(isa, c(1,1,1), sides = 1) == 2L & !isa 

    na.omit(x[ids])[1] #subset  
} 

apply(dat, 1, fun) 
#id1 id2 id3 
#"b" "b" "d"

來源

2016-11-10 11:40:48 Roland

非常感謝，測試數據和大規模數據 – ZMacarozzi

嗯，這裏是一個嘗試這是一個有點亂，

l1 <- lapply(apply(dat, 1, function(i) as.integer(which(i == a))), 
          function(j) j[cumsum(c(1, diff(j) != 1)) == 1]) 

ind <- unname(sapply(l1, function(i) tail(i, 1) + 1)) 

dat$s3 <- diag(as.matrix(dat[ind])) 

dat$s3 
#[1] "b" "b" "d"

或函數中把它包起來，

fun1 <- function(df){ 
    l1 <- lapply(apply(df, 1, function(i) as.integer(which(i == a))), 
       function(j) j[cumsum(c(1, diff(j) != 1)) == 1]) 
    ind <- unname(sapply(l1, function(i) tail(i, 1) + 1)) 
    return(diag(as.matrix(df[ind]))) 
} 

fun1(dat) 
#[1] "b" "b" "d"

來源

2016-11-10 11:29:24 Sotos

感謝這對測試數據集有效。我現在將在實際的大規模數據集上檢查這個和以前的解決方案，希望我能夠很好地解決這個問題，我非常感謝幫助。 – ZMacarozzi

試試這個（假設你已經在每一行重複）：

library(stringr) 
dat$s3 <-apply(dat, 1, function(x) str_match(paste(x, collapse=''),'aa([^a])')[,2]) 

    X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 s3 
id1 a a a a b b d d a a d b 
id2 b d d d a a a a b b d b 
id3 b d d a a a a d b d d d

來源

2016-11-10 11:31:37

感謝這對測試數據集有效。我不明白代碼的所有部分，但我現在將嘗試應用到大數據並查看它的工作原理，非常感謝，非常感謝幫助。 – ZMacarozzi

或者，也許只是矢量化整個事情？ 'str_match（do.call（paste0，dat），'aa（[^ a]）'）[，2]' –

查找數據框中跨行的序列之後的元素

回答

相關問題