這似乎有點複雜,但邏輯非常簡單。在使用我的「splitstackshape」包中的cSplit
後,它在「data.table」中使用rle
。我也裝「dplyr」,使鏈接的步驟對眼睛更容易一些:
library(splitstackshape)
library(dplyr)
data.table(ID = 1:length(arr), arr = arr) %>% ## create a data.table of arr
cSplit("arr", ">", "long") %>% ## Split into a long form
.[, rle(as.character(arr)), by = .(ID)] %>% ## Calculate the run lengths
.[, paste(values, ## Paste values and lengths
sprintf(" (x%s)", lengths), ## ... after formatting lengths
collapse = " > ", sep = ""), ## ... collapsed by >
by = .(ID)] %>% ## ... and grouped by ID
.[, gsub(" (x1)", "", V1, fixed = TRUE)] ## Remove the (x1) values
# [1] "Organic Search > Direct (x2)"
# [2] "Direct (x3)"
# [3] "Referral"
# [4] "Organic Search > Direct (x2) > Referral > Direct (x3)"
# [5] "Organic Search (x2) > Direct > Organic Search (x2)"
這是同樣的概念,但是做了使用基礎R:
arrSplit <- strsplit(arr, " > ", TRUE)
sapply(arrSplit, function(x) {
A <- rle(x)
A$lengths <- sprintf("(x%s)", A$lengths)
temp <- paste(A$values, A$lengths, collapse = " > ", sep = " ")
gsub(" (x1)", "", temp, fixed = TRUE)
})
# [1] "Organic Search > Direct (x2)"
# [2] "Direct (x3)"
# [3] "Referral"
# [4] "Organic Search > Direct (x2) > Referral > Direct (x3)"
# [5] "Organic Search (x2) > Direct > Organic Search (x2)"
示例數據:
arr <- c("Organic Search > Direct > Direct",
"Direct > Direct > Direct",
"Referral",
"Organic Search > Direct > Direct > Referral > Direct > Direct > Direct",
"Organic Search > Organic Search > Direct > Organic Search > Organic Search")
arr
# [1] "Organic Search > Direct > Direct"
# [2] "Direct > Direct > Direct"
# [3] "Referral"
# [4] "Organic Search > Direct > Direct > Referral > Direct > Direct > Direct"
# [5] "Organic Search > Organic Search > Direct > Organic Search > Organic Search"
真的很乾淨,很容易理解解決方案。我將arr從data.frame作爲'arr < - ga.mcf [,「channelPath」]'獲得,儘管對於第二種方法,我必須將代碼調整爲arrSplit < - strsplit(as.character(arr),「>> 「,TRUE)'爲它工作。我選擇了splitstackshape選項,因爲它似乎對我的代碼產生了更快的結果,因爲我必須爲超過100個不同的站點創建流程(不過,這是一個完全主觀的判斷,因爲我必須每次調用api)。但無論如何,真的很好的邏輯! – agustin 2015-03-13 12:13:29