2014-06-24 32 views
-5

我有一個包含600個響應的數據集,其中包含來自回覆者的反饋/評論的「Free_Text」變量。現在我想計算每位受訪者評論中的字數。我應該怎麼做?我是一名R的新學員,現在正在R studio工作。計算R中開放式響應的單詞數

+4

請不要求人無[重複的例子(http://stackoverflow.com/問題/ 5963269 /如何對做 - 一個偉大-R重現-例子)。 – Thomas

回答

1

拆分字符串並對元素進行計數是讓您開始的簡單方法。

str = "This is a string." 

str_length = length(strsplit(str," ")[[1]]) 

> str_length 
[1] 4 
1

可能這會有所幫助:

str1 <- c("How many words are in this sentence","How many words") 
sapply(gregexpr("\\W+", gsub("[[:punct:]]+","",str1)), length) + 1 
#[1] 7 3 

此外,

library(qdap) 
word_count(str1) 
#[1] 7 3 

str2 <- "How many words?." 
word_count(str2) 
#[1] 3 
2

考慮使用stri_extract_wordsstringi包,特別是如果你有一個非英語文本。它使用ICU的BreakIterator來執行此任務,幷包含一系列複雜的分詞規則。

library(stringi) 
str <- c("How many words are there?", "R — язык программирования для статистической обработки данных и работы с графикой, а также свободная программная среда вычислений с открытым исходным кодом в рамках проекта GNU.") 
stri_extract_words(str) 
## [[1]] 
## [1] "How" "many" "words" "are" "there" 
## 
## [[2]] 
## [1] "R"    "язык"    "программирования" "для"    "статистической" 
## [6] "обработки"  "данных"   "и"    "работы"   "с"    
## [11] "графикой"   "а"    "также"   "свободная"  "программная"  
## [16] "среда"   "вычислений"  "с"    "открытым"   "исходным"   
## [21] "кодом"   "в"    "рамках"   "проекта"   "GNU" 
sapply(stri_extract_words(str), length) # how many words are there in each character string? 
## [1] 5 25 
0

而且,多了一個方法,使用stringr包,要列出單個單詞:

str1 <- c("How many words are in this sentence","How many words") 
length(unlist(str_match_all(str1, "\\S+"))) # list all words -- strings that end with one or more white spaces, then unlist them so that the length function counts them