如何在r中查找一個向量中的字符串？

我創建了一個基本上創建1000個二進制值的向量的函數。我已經能夠使用rle來計算連續1秒的最長連續數。如何在r中查找一個向量中的字符串？

我想知道如何在這個更大的向量中找到一個特定的向量（比如說c(1,0,0,1)）？我希望它能夠返回該向量的出現次數。所以c(1,0,0,1,1,0,0,1)應該返回2，而c(1,0,0,0,1)應該返回0

，我發現大多數解決方案只覺得是發生在所有的序列，並返回TRUE或FALSE，或者他們給了個人價值的結果，而不是特定的向量被指定。

這裏是我到目前爲止的代碼：

# creates a function where a 1000 people choose either up or down. 
updown <- function(){ 
    n = 1000 
    X = rep(0,n) 
    Y = rbinom(n, 1, 1/2) 
    X[Y == 1] = "up" 
    X[Y == 0] = "down" 

    #calculate the length of the longest streak of ups: 
    Y1 <- rle(Y) 
    streaks <- Y1$lengths[Y1$values == c(1)] 
    max(streaks, na.rm=TRUE) 
} 

# repeat this process n times to find the average outcome. 
longeststring <- replicate(1000, updown()) 
longeststring(p_vals)

來源

2016-10-24 TheCurlyManLives

由於Y只有0 S和1 S，我們可以paste它變成一個字符串，並使用正則表達式，具體gregexpr。簡化了一下：

set.seed(47) # for reproducibility 

Y <- rbinom(1000, 1, 1/2) 

count_pattern <- function(pattern, x){ 
    sum(gregexpr(paste(pattern, collapse = ''), 
       paste(x, collapse = ''))[[1]] > 0) 
} 

count_pattern(c(1, 0, 0, 1), Y) 
## [1] 59

paste減少圖案並Y下爲字符串，例如這裏的模式爲"1001"，Y爲1000個字符的字符串。 gregexpr在Y中搜索該模式的所有匹配項，並返回匹配的索引（以及更多信息，以便可以提取它們，如果需要的話）。因爲gregexpr將返回-1不匹配，測試大於0的數字將讓我們簡單地總結TRUE值以獲取macthes的數量;在這種情況下，59

其他樣品的情況下提到：

count_pattern(c(1,0,0,1), c(1,0,0,1,1,0,0,1)) 
## [1] 2 

count_pattern(c(1,0,0,1), c(1,0,0,0,1)) 
## [1] 0

來源

2016-10-24 05:19:42 alistaire

這也將工作：

library(stringr) 
x <- c(1,0,0,1) 
y <- c(1,0,0,1,1,0,0,1) 
length(unlist(str_match_all(paste(y, collapse=''), '1001'))) 
[1] 2 
y <- c(1,0,0,0,1) 
length(unlist(str_match_all(paste(y, collapse=''), '1001'))) 
[1] 0

如果你想匹配重疊的圖案，

y <- c(1,0,0,1,0,0,1) # overlapped 
length(unlist(gregexpr("(?=1001)",paste(y, collapse=''),perl=TRUE))) 
[1] 2

來源

2016-10-24 06:12:02

@馮天其實我們需要使用前瞻斷言，更新代碼，讓我知道如果它不起作用。 –

我明白了。你是對的。 –

如何在r中查找一個向量中的字符串？

回答

相關問題