2017-01-16 61 views
8

如何統計向量中一組字符的重複次數?想象以下向量包括"A""B"統計一組字符的重複次數

x <- c("A", "A", "A", "B", "B", "A", "A", "B", "A") 

在這個例子中,第一組將是"A""B"從索引1處的序列5,第二組是"A""B"從索引序列6到8,然後第三組是最後一個單"A"

x <- c("A", "A", "A", "B", "B", # set 1 
     "A", "A", "B",   # set 2 
     "A")      # set 3 

如何設置每個組變量櫃檯?我需要這樣的載體:

c(1, 1, 1, 1, 1, 2, 2, 2, 3) 

感謝

回答

4

替代1.

cumsum(c(TRUE, diff(match(x, c("A", "B"))) == -1)) 
# [1] 1 1 1 1 1 2 2 2 3 

一步一步:

match(x, c("A", "B")) 
# [1] 1 1 1 2 2 1 1 2 1 

diff(match(x, c("A", "B"))) 
# [1] 0 0 1 0 -1 0 1 -1 

diff(match(x, c("A", "B"))) == -1 
# [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE 

c(TRUE, diff(match(x, c("A", "B"))) == -1) 
# [1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE 

替代2.

data.table::rleid使用:

rleid(x) 
# [1] 1 1 1 2 2 3 3 4 5 

rleid(x) %% 2 
# [1] 1 1 1 0 0 1 1 0 1 

diff(rleid(x) %% 2) 
# [1] 0 0 -1 0 1 0 -1 1 

diff(rleid(x) %% 2) == 1 
# [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE 

c(TRUE, diff(rleid(x) %% 2) == 1) 
# [1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE 

library(data.table) 
cumsum(c(TRUE, diff(rleid(x) %% 2) == 1)) 
# [1] 1 1 1 1 1 2 2 2 3 

一步一步

2

我們只能使用base R方法

x1 <- split(x, cumsum(c(TRUE, x[-1]!= x[-length(x)]))) 
x2 <- sapply(x1, `[`, 1) 
as.numeric(rep(ave(x2, x2, FUN = seq_along), lengths(x1))) 
#[1] 1 1 1 1 1 2 2 2 3 
11

使用rle

x <- c("A", "A", "A", "B", "B", "A", "A", "B", "A") 
tmp <- rle(x) 
#Run Length Encoding 
# lengths: int [1:5] 3 2 2 1 1 
# values : chr [1:5] "A" "B" "A" "B" "A" 

現在改變數值:

tmp$values <- ave(rep(1L, length(tmp$values)), tmp$values, FUN = cumsum) 

和逆遊程長度編碼:

y <- inverse.rle(tmp) 
#[1] 1 1 1 1 1 2 2 2 3