使用R計算字符串中的連續模式

我正在嘗試編寫一個函數來計算模式的連續實例數。舉個例子，我想串使用R計算字符串中的連續模式

string<-"A>A>A>B>C>C>C>A>A"

改造成

"3 A > 1 B > 3 C > 2 A"

我有一個計算每個字符串的實例函數，見下文。但它沒有達到我想要的排序效果。任何想法或指針？

感謝，

現有功能：

fnc_gen_PathName <- function(string) { 
p <- strsplit(as.character(string), ";") 
p1 <- lapply(p, table) 
p2 <- lapply(p1, function(x) { 
sapply(1:length(x), function(i) { 
    if(x[i] == 25){ 
    paste0(x[i], "+ ", names(x)[i]) 
    } else{ 
    paste0(x[i], "x ", names(x)[i]) 
    } 
}) 
}) 
p3 <- lapply(p2, function(x) paste(x, collapse = "; ")) 
p3 <- do.call(rbind, p3) 
return(p3) 
}

來源

2015-12-01 Robin Sheridan

你看過'rle（）'函數嗎？如果你把你的字符串分成一個向量，應該工作得很好。 – MrFlick

完美，非常感謝。從未聽說過此功能，但非常容易使用。 –

正如評論說@MrFlick你可以嘗試以下使用rle和strsplit

with(rle(strsplit(string, ">")[[1]]), paste(lengths, values, collapse = " > ")) 
## [1] "3 A > 1 B > 3 C > 2 A"

來源

2015-12-01 15:16:44

這裏有兩個dplyr解決方案：一個常規和一個rle。優點是：可以輸入多個字符串作爲矢量，在（ugh）重新嵌入之前建立一個整齊的中間數據集。

library(dplyr) 
library(tidyr) 
library(stringi) 

strings = "A>A>A>B>C>C>C>A>A" 


data_frame(string = strings) %>% 
    mutate(string_split = 
      string %>% 
      stri_split_fixed(">")) %>% 
    unnest(string_split) %>% 
    mutate(ID = 
      string_split %>% 
      lag %>% 
      `!=`(string_split) %>% 
      plyr::mapvalues(NA, TRUE) %>% 
      cumsum) %>% 
    count(string, ID, string_split) %>% 
    group_by(string) %>% 
    summarize(new_string = 
       paste(n, 
        string_split, 
        collapse = " > ")) 

data_frame(string = strings) %>% 
    group_by(string) %>% 
    do(.$string %>% 
     first %>% 
     stri_split_fixed(">") %>% 
     first %>% 
     rle %>% 
     unclass %>% 
     as.data.frame) %>% 
    summarize(new_string = 
       paste(lengths, values, collapse = " > "))

來源

2015-12-01 15:53:26 bramtayl

使用R計算字符串中的連續模式

回答

相關問題