0
我試圖計算一個數據幀中某一行中每個字出現在給定時間的次數。這是我的數據框:將熔化的表格對象返回原始數據框?
library(stringr)
df <- data.frame("Corpus" = c("this is some text",
"here is some more text text",
"more food for everyone",
"less for no one",
"something text here is some more text",
"everyone should go home",
"more random text",
"random text more more more",
"plenty of random text",
"the final piece of random everyone text"),
"Class" = c("X", "Y", "Y", "Y", "Y",
"Y", "Y", "Z",
"Z", "Z"),
"OpenTime" = c("12/01/2016 10:45:00", "11/07/2016 10:32:00",
"11/15/2015 01:45:00", "08/23/2012 1:23:00",
"12/17/2016 11:45:00", "12/16/2016 9:47:00",
"04/11/2015 04:23:00", "11/27/2016 12:12:00",
"08/25/2015 10:46:00", "09/27/2016 10:46:00"))
我試圖得到這樣的結果:
Class OpenTime Word Frequency
X 12/01/2016 10:45:00 this 1
X 12/01/2016 10:45:00 is 1
X 12/01/2016 10:45:00 some 1
X 12/01/2016 10:45:00 text 1
Y 11/07/2016 10:32:00 here 1
Y 11/07/2016 10:32:00 is 1
Y 11/07/2016 10:32:00 some 1
Y 11/07/2016 10:32:00 more 1
Y 11/07/2016 10:32:00 text 2
...
我很願意做這一切與dplyr
groupby
,但我還沒有拿到,要工作。相反,這是我已經試過:
splits <- strsplit(as.character(df$Corpus), split = " ")
counts <- lapply(splits, table)
counts.melted <- lapply(counts, melt)
這讓我換位觀點我想:
> counts.melted
[[1]]
Var1 value
1 is 1
2 some 1
3 text 1
4 this 1
[[2]]
Var1 value
1 here 1
2 is 1
3 more 1
4 some 1
5 text 1
...
但我怎麼能扎熔化的載體該名單的是原始數據,產生上面所需的輸出?我嘗試使用rep
爲每個行中的單詞重複Class
值,但幾乎沒有成功。在for
循環中完成所有這一切將很容易,但我會很多而是使用矢量化方法(如lapply
)來執行此操作。
out.df <- data.frame("RRN" = NULL, "OpenTime" = NULL,
"Word" = NULL, "Frequency" = NULL)