計數文字組合頻率

我有句子的向量，說：計數文字組合頻率

x = c("I like donut", "I like pizza", "I like donut and pizza")

我想指望兩個單詞的組合。理想的輸出是一個數據幀有3列（WORD1，單詞2和頻率），並且將是這樣的：

I  like 3 
I  donut 2 
I  pizza 2 
like donut 2 
like pizza 2 
donut pizza 1 
donut and  1 
pizza and  1

在輸出的第一記錄，freq = 3因爲"I"和"like"發生一起3次：x[1] ，x[2]和x[3]。

任何建議表示讚賞:)

來源

2014-12-20 nurandi

你使用谷歌或發佈此問題之前，搜索欄？嘗試[this]（http://stackoverflow.com/questions/11403196/r-count-times-word-appears-in-element-of-list）或[this]（http://stackoverflow.com/questions/ 18864612 /出現頻率的兩對組合在文本數據在r）或[任何這些]（http://stackoverflow.com/search?q=R+word+combinations ）。 –

「我我」和「喜歡」等什麼？想必你只需要那些*不同*字詞的組合呢？ 'gtools :: permutations'可能對你有用 –

@OliverKeyes：是的，當然。 – nurandi

split進言，sort正確識別對，讓所有對與combn，paste對將文字的空格分隔的對，使用table得到的頻率，然後放它在一起。

下面是一個例子：

f <- function(x) { 
    pr <- unlist(
    lapply(
     strsplit(x, ' '), 
     function(i) combn(sort(i), 2, paste, collapse=' ') 
    ) 
) 

    tbl <- table(pr) 

    d <- do.call(rbind.data.frame, strsplit(names(tbl), ' ')) 
    names(d) <- c('word1', 'word2') 
    d$Freq <- tbl 

    d 
}

與您的數據。例如：

> f(x) 
    word1 word2 Freq 
1 and donut 1 
2 and  I 1 
3 and like 1 
4 and pizza 1 
5 donut  I 2 
6 donut like 2 
7 donut pizza 1 
8  I like 3 
9  I pizza 2 
10 like pizza 2

來源

2014-12-20 02:05:39

太好了。使用'combn'，我也可以計算3個或更多個單詞組合的出現次數。謝謝：） – nurandi

計數文字組合頻率

回答

相關問題