平均數的R中的字符矢量字的

我試圖讓字的平均數在我的特徵向量中的R平均數的R中的字符矢量字的

one <- c(9, 23, 43) 
two <- c("this is a new york times article.", "short article.", "he went outside to smoke a cigarette.") 

mydf <- data.frame(one, two) 
mydf 

# one         two 
# 1 9  this is a new york times article. 
# 2 23      short article. 
# 3 43 he went outside to smoke a cigarette.

我要找的，讓我平均數的函數字符向量「two」的詞語。

這裏的輸出應該是5.3333（=（7 + 2 + 7）/ 3）

來源

2014-03-12 cptn

或者gregexpr()

mean(sapply(mydf$two,function(x)length(unlist(gregexpr(" ",x)))+1)) 
[1] 5.333333

來源

2014-03-12 11:09:56 Troy

'平均（sapply（gregexpr（「」，mydf $ 2），長度+1）'是相同的概念，但更簡潔一點.... – A5C1D2H2I1M1N2O1R2T1

@AnandaMahto是好點，不知道爲什麼我沒有' t首先這樣做 – Troy

我的猜測*是，如果您使用我的建議，您將獲得速度提升，因爲它可以減少對「gregexpr」的調用次數。我還建議實際的解決方案應該包括：（1）首先修剪任何可能存在的前後空格;（2）使搜索詞類似'「\\ s +」'。 – A5C1D2H2I1M1N2O1R2T1

哈德利韋翰的stringr包可能爲此提供了最簡單的方法：

library(stringr) 
foo<- str_split(two, " ") # split each element of your vector by the space sign 
sapply(foo,length) # just a quick test: how many words has each element? 
sum(sapply(foo,length))/length(foo) # calculate sum and divide it by the length of your original object 
[1] 5.333333

來源

2014-03-12 10:31:27 Max

stringr方式看起來與基本方式非常相似。唯一的區別似乎是下劃線。 ;） – sgibb

我敢肯定有是一些更詳盡的方法可用，但您可以使用strsplit將空格中的字符串拆分爲字符向量並計算其元素長度。

mean(sapply(strsplit(as.character(mydf$two), "[[:space:]]+"), length)) 
# [1] 5.3333

來源

2014-03-12 10:31:55 sgibb

下面是與qdap包的可能性：

library(qdap) 
wc(mydf$two, FALSE)/nrow(mydf) 

## [1] 5.333333

這是矯枉過正，但你也可以這樣做：

word_stats(mydf$two) 

## all n.sent n.words n.char n.syl n.poly wps cps sps psps cpw spw pspw n.state proDF2 n.hapax n.dis grow.rate prop.dis 
## 1 all  3  16  68 23  3 5.333 22.667 7.667 1 4.250 1.438 .188  3  1  12  2  .750  .125

而且wps列是每句話的詞數。

來源

2014-03-12 13:03:02

創建word_stats對象並將其分配給具有該類的對象後，爲什麼plot.word_stats（obj）不起作用？ – lawyeR

通用'plot'對象可以在課程中起作用，所以如果你已經改變了課程，或者新課程有自己的繪圖方法，那麼通用的'plot'將不再起作用。無論如何，'word_stats'的'plot'只是'qheat'的一個包裝，所以你仍然可以使用'qheat'。 –

@lawyeR如果這沒有回答這個問題，請用數據和示例打開一個新問題。 –

平均數的R中的字符矢量字的

回答

相關問題