在R中計算單詞出現次數

是否有函數用於統計數據集中包含特定關鍵字的次數？在R中計算單詞出現次數

例如，如果dataset <- c("corn", "cornmeal", "corn on the cob", "meal")的計數將在3

2011-10-16 LNA

讓我們暫時假設你想要元素包含「玉米」的個數：

length(grep("corn", dataset)) 
[1] 3

後你會得到R的基本下跌更好的你可能想看看「tm」包。

編輯：我意識到這次你想要任何 - 「玉米」，但在未來你可能想要得到的詞 - 「玉米」。在對R-幫助比爾·鄧拉普指出了一個更緊湊的grep模式收集全字：

grep("\\<corn\\>", dataset)

來源

2011-10-16 03:41:35

你可以上「」分裂載體，做uniqu e並在整個表格上運行表格。 :) –

對。這突出了原始問題的含糊性。我無法弄清楚爲什麼4是正確的數字。你的方法會返回2爲「玉米」，1爲「餐」，1爲「玉米麪」。計算空格分隔的單詞「玉米」的可能方式可能是：長度（grep（「^ corn corn $ |^corn_玉米$」，數據集）） –

這是一個錯字，抱歉。計數將是3. – LNA

的另一個相當方便和直觀辦法做到這一點是使用stringr包的str_count功能：

library(stringr) 
dataset <- c("corn", "cornmeal", "corn on the cob", "meal") 

# for mere occurences of the pattern: 
str_count(dataset, "corn") 
# [1] 1 1 1 0 

# for occurences of the word alone: 
str_count(dataset, "\\bcorn\\b") 
# [1] 1 0 1 0 

# summing it up 
sum(str_count(dataset, "corn")) 
# [1] 3

來源

2013-03-12 08:43:39 petermeissner

你也可以做類似如下：

length(dataset[which(dataset=="corn")])

來源

2017-12-02 12:48:34 Junaid

在R中計算單詞出現次數

回答

相關問題