分割字符串，並生成頻數表中的R

我的事務所名稱在[R據幀一列是這樣的：分割字符串，並生成頻數表中的R

"ABC Industries" 
"ABC Enterprises" 
"123 and 456 Corporation" 
"XYZ Company"

等。我試圖產生出現在此列中的每個詞的頻數表，因此，例如，像這樣：

Industries 10 
Corporation 31 
Enterprise 40 
ABC   30 
XYZ   40

我是比較新的[R，所以我想知道的好方法來解決這個問題。我應該分割字符串並將每一個不同的單詞放入一個新的列嗎？有沒有辦法將一個多字行分成多行並且有一個字？

來源

2011-12-30 aesir

如果你想，你能做到這一點的一個班輪：

R> text <- c("ABC Industries", "ABC Enterprises", 
+   "123 and 456 Corporation", "XYZ Company") 
R> table(do.call(c, lapply(text, function(x) unlist(strsplit(x, " "))))) 

     123   456   ABC   and  Company 
      1   1   2   1   1 
Corporation Enterprises Industries   XYZ 
      1   1   1   1 
R>

這裏我用strsplit()打破每個條目介紹的組件;這將返回一個列表（在一個列表中）。我使用do.call()，因此只需簡單地將所有結果列表連接成一個向量，即table()總結。

來源

2011-12-30 04:38:35

非常感謝。我一直在擺弄原始代碼，我發現我得到了相同的結果： table（unlist（strsplit（text，「」））） lapply（）和do.call（）的用途是什麼？ – aesir 2012-01-03 22:31:08

這是另一個單線程。它採用paste()所有列項的合併成一個長文本字符串，它然後分裂開來並列表：

text <- c("ABC Industries", "ABC Enterprises", 
     "123 and 456 Corporation", "XYZ Company") 

table(strsplit(paste(text, collapse=" "), " "))

來源

2011-12-30 07:00:19

+1非常好，我只會添加split =「\\ s {1，}」以使它更穩健 – 2011-12-31 12:38:49

@WojciechSobala是的 - 我有同樣的想法，並且它可能更好/更接近OP想要的東西。 'split =「\\ s +」'或'split =「[[：space：]] +」'是另外兩個完全相同的選項。 – 2011-12-31 15:11:35

您可以使用包tidytext和dplyr：

set.seed(42) 

text <- c("ABC Industries", "ABC Enterprises", 
     "123 and 456 Corporation", "XYZ Company") 

data <- data.frame(category = sample(text, 100, replace = TRUE), 
        stringsAsFactors = FALSE) 

library(tidytext) 
library(dplyr) 

data %>% 
    unnest_tokens(word, category) %>% 
    group_by(word) %>% 
    count() 

#> # A tibble: 9 x 2 
#> # Groups: word [9] 
#>   word  n 
#>   <chr> <int> 
#> 1   123 29 
#> 2   456 29 
#> 3   abc 45 
#> 4   and 29 
#> 5  company 26 
#> 6 corporation 29 
#> 7 enterprises 21 
#> 8 industries 24 
#> 9   xyz 26

來源

2018-02-02 14:03:27 FilipW

分割字符串，並生成頻數表中的R

回答

相關問題