3
我嘗試從涅text文本中提取3克,因此對於tfis我使用ngramrr
包。提取ngram與R
require(ngramrr)
require(tm)
require(magrittr)
nirvana <- c("hello hello hello how low", "hello hello hello how low",
"hello hello hello how low", "hello hello hello",
"with the lights out", "it's less dangerous", "here we are now", "entertain us",
"i feel stupid", "and contagious", "here we are now", "entertain us",
"a mulatto", "an albino", "a mosquito", "my libido", "yeah", "hey yay")
ngramrr(nirvana[1], ngmax = 3)
Corpus(VectorSource(nirvana))
我得到這樣的結果:
[1] "hello" "hello" "hello" "how" "low" "hello hello" "hello hello"
[8] "hello how" "how low" "hello hello hello" "hello hello how" "hello how low"
我想知道我該怎麼做才能構建TermDocumentMatrix
其中術語是卦名單。
謝謝
我會用'quanteda'並轉換爲'tm'格式。 'nirvana%>%tokens(ngrams = 1:3)%>%dfm%>%convert(to =「tm」)' –
@amatsuo_net謝謝你,你能幫我一個R例子嗎? –
@Cath謝謝;) –