2014-04-27 48 views
1

例如,我有一個向量中的「計算機」元素。我需要得到一個由「c」,「o」,「m」,「p」,「u」,「t」,「e」,「r」組成的向量。如何通過字母拆分矢量中的某個元素?

而我的問題的第二部分是可選的。我如何創建一個包含上述矢量元素的字母組合的矢量,並且在結果組合中的字母將只按照原始單詞中的順序創建?例如,我想在這個矢量中取代「tumpo」之類的「puter」或「mpu」。

回答

1

對於問題的第一部分是很容易得到:

splits <- unlist(strsplit("computer",split="")) 

> splits 
[1] "c" "o" "m" "p" "u" "t" "e" "r" 

對於您可以使用下面的代碼的第二部分:

subseqs <- 
    unlist(
    lapply(1:length(splits),FUN=function(x){ 
     lapply(1:(length(splits)+1-x),FUN=function(y){ 
     paste(splits[y:(y+x-1)],collapse="") }) 
    }) 
) 
> subseqs 
[1] "c"  "o"  "m"  "p"  "u"  "t"  "e"  
[8] "r"  "co"  "om"  "mp"  "pu"  "ut"  "te"  
[15] "er"  "com"  "omp"  "mpu"  "put"  "ute"  "ter"  
[22] "comp"  "ompu"  "mput"  "pute"  "uter"  "compu" "omput" 
[29] "mpute" "puter" "comput" "ompute" "mputer" "compute" "omputer" 
[36] "computer" 
3

您可以使用

strsplit("computer", "\\b") 

and

library("RWeka") 
gsub(" ", "", 
    NGramTokenizer(paste(strsplit("computer", "\\b")[[1]], collapse=" "), 
        Weka_control(min=2, 
           max=5)), 
    fixed=TRUE) 
# [1] "compu" "omput" "mpute" "puter" "comp" 
# [6] "ompu" "mput" "pute" "uter" "com" 
# [11] "omp" "mpu" "put" "ute" "ter" 
# [16] "co"  "om" "mp" "pu" "ut" 
# [21] "te" "er" 

用於創建n-grams,其中2 < = n < = 5。

0

連續三個字母組合:

x <- strsplit("computer", "\\b") 
y <- combn(seq(x),3); m <- match(1:6,y[1,]) 
combn (x,3)[,m] 

enter image description here

相關問題