R中的n-克錯誤：無效的「時間」參數

我正在嘗試關注this example，但遇到了錯誤。R中的n-克錯誤：無效的「時間」參數

> library("RWeka") 
> library("tm") 
Loading required package: NLP 
> data("crude") 
> BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2)) 
> tdm <- TermDocumentMatrix(crude, control = list(tokenize = BigramTokenizer)) 
Error in rep(seq_along(x), sapply(tflist, length)) : 
    invalid 'times' argument 
In addition: Warning message: 
In mclapply(unname(content(x)), termFreq, control) : 
    scheduled core 1 encountered error in user code, all values of the job will be affected

任何想法？

來源

2016-07-27 geotheory

只需使用一些更好的/現代的包裝。我可以提出幾種選擇：

使用text2vec而不是tm。例子參見vignettes。（我是作者）。
值得檢查quanteda
如果出於某些你喜歡tm原因，儘量tokenizers包更換RWeka NGRAM分詞器。

來源

2016-08-02 10:00:14

這正是我所追求的，真是令人震撼的C++速度！ – geotheory

R中的n-克錯誤：無效的「時間」參數

回答

相關問題