term-document-matrix

2熱度

1回答

我正在做文本分類。有什麼情況下TF-IDF使用term-frequency向量更糟？如何解釋它？感謝

11熱度

3回答

我想創建一個有NLTK和熊貓的期限文檔矩陣。我寫了下面的功能： def fnDTM_Corpus(xCorpus): import pandas as pd '''to create a Term Document Matrix from a NLTK Corpus''' fd_list = [] for x in range(0, len(xCorpus

47熱度

4回答

使用tm_map（...，tolower）將文本轉換爲小寫時出錯

我嘗試使用tm_map。它給出了以下錯誤。我怎樣才能解決這個問題？ require(tm) byword<-tm_map(byword, tolower) Error in UseMethod("tm_map", x) : no applicable method for 'tm_map' applied to an object of class "character"

4熱度

2回答

Frequency Per Term - R TM DocumentTermMatrix

我對R非常陌生，無法將自己的頭圍繞DocumentTermMatrixs。我有一個使用TM包創建的DocumentTermMatrix，它有術語頻率和其中的術語，但我無法弄清楚如何訪問它們。理想情況下，我想： Term # "the" 200 "is" 400 "a" 200 目前我的代碼是： library(tm) common.words <- c

7熱度

1回答

[R TM封裝創建NMOST常用術語

的矩陣我使用tm包河我試圖創建一個矩陣/數據框有50周最頻繁出現的詞條創建的termDocumentMatrix。當我嘗試轉換爲矩陣我得到這個錯誤： > ap.m <- as.matrix(mydata.dtm) Error: cannot allocate vector of size 2.0 Gb 所以我試圖用矩陣包轉換爲稀疏矩陣： > A <- as(mydata.dtm, "sp