2016-11-16 66 views
2

我想在每個Id級別出現第一個「C」之前計算「I」的出現次數。我已經試過這段代碼,但可以計算列中出現的所有「I」。 代碼我試過在特定字母之前計算字母表的出現

library(plyr) 
Impres = ddply(df, .(Id), summarize, No_of_I_before_First_C = length(which(Character == "I"))) 

的樣本數據

Id Character 
1  I 
1  I 
1  C 
1  I 
2  I 
2  C 

輸出應該是這樣的

Id Count_Of_I_before_First_C 
1  2 
2  1 

回答

0

這裏有一個想法,

first1 <- function(x, letter){ 
      which(x == letter)[1]-1 
      } 

aggregate(Character ~ Id, df, first1, 'C') 
# Id Character 
#1 1   2 
#2 2   1 

要概括它多一點,

first1 <- function(x, letter, letter_count){ 
    ind <- which(x == letter)[1] 
    sum(grepl(letter_count, x[1:ind])) 
    } 

aggregate(Character ~ Id, df, first1, 'C', 'I') 
# Id Character 
#1 1   2 
#2 2   1 
+0

這將是相當大的數據集 – Bulat

+1

慢@Bulat我只是跟隨'的問題agregate'標籤(即沒有包)。我知道'dplyr'和'data.table'都有更高效的方法 – Sotos

0
require(dplyr) 
require(magrittr) 
df <- data.frame(Id = c(1,1,1,1,2,2), Character = c('I', 'I', 'C', 'I', 'I', 'C')) 

這個功能會給你我的數第一個C

foo <- function (character) { 

    is_before_C <- (character == 'C') %>% cummax() %>% not() 
    # is_before_C <- !cummax(character == 'C') # the same 
    is_I <- character == 'I' 
    is_I_before_C <- is_I & is_before_C 

    return(sum(is_I_before_C)) 
} 

之前,然後你就可以使用這個功能來彙總數據

df %>% 
    group_by(Id) %>% 
    summarise(Count_Of_I_before_First_C = foo(Character)) 

結果:

# A tibble: 2 × 2 
    Id Count_Of_I_before_First_C 
    <dbl>      <int> 
1  1       2 
2  2       1 
0

這裏是data.table解決方案:

library(data.table) 
dt <- data.table(Id = c(1,1,1,1,2,2), Character = c('I', 'I', 'C', 'I', 'I', 'C')) 
dt[, cnt.c := cumsum(Character == "C"), by = Id] 
res <- dt[cnt.c == 0, .(Count_Of_I_before_First_C = length(Character)), by = Id] 
0

也許:

library(dplyr) 

rlei <- function(x) { 
    r <- rle(x) 
    I <- which(r$values=="I") 
    C <- which(r$values=="C") 
    r$lengths[which(I<C)][1] 
} 

group_by(df, Id) %>% 
    summarise(Count_Of_I_before_First_C=rlei(.$Character))