2016-12-12 68 views
0

我試圖計算一個數據幀中某一行中每個字出現在給定時間的次數。這是我的數據框:將熔化的表格對象返回原始數據框?

library(stringr) 

df <- data.frame("Corpus" = c("this is some text", 
           "here is some more text text", 
           "more food for everyone", 
           "less for no one", 
           "something text here is some more text", 
           "everyone should go home", 
           "more random text", 
           "random text more more more", 
           "plenty of random text", 
           "the final piece of random everyone text"), 

       "Class" = c("X", "Y", "Y", "Y", "Y", 
          "Y", "Y", "Z", 
          "Z", "Z"), 

       "OpenTime" = c("12/01/2016 10:45:00", "11/07/2016 10:32:00", 
           "11/15/2015 01:45:00", "08/23/2012 1:23:00", 
           "12/17/2016 11:45:00", "12/16/2016 9:47:00", 
           "04/11/2015 04:23:00", "11/27/2016 12:12:00", 
           "08/25/2015 10:46:00", "09/27/2016 10:46:00")) 

我試圖得到這樣的結果:

Class OpenTime    Word Frequency 
X  12/01/2016 10:45:00 this 1 
X  12/01/2016 10:45:00 is  1 
X  12/01/2016 10:45:00 some 1 
X  12/01/2016 10:45:00 text 1 
Y  11/07/2016 10:32:00 here 1 
Y  11/07/2016 10:32:00 is  1 
Y  11/07/2016 10:32:00 some 1 
Y  11/07/2016 10:32:00 more 1 
Y  11/07/2016 10:32:00 text 2 
... 

我很願意做這一切與dplyrgroupby,但我還沒有拿到,要工作。相反,這是我已經試過:

splits <- strsplit(as.character(df$Corpus), split = " ") 

counts <- lapply(splits, table) 

counts.melted <- lapply(counts, melt) 

這讓我換位觀點我想:

> counts.melted 
[[1]] 
    Var1 value 
1 is  1 
2 some  1 
3 text  1 
4 this  1 

[[2]] 
    Var1 value 
1 here  1 
2 is  1 
3 more  1 
4 some  1 
5 text  1 
... 

但我怎麼能扎熔化的載體該名單的是原始數據,產生上面所需的輸出?我嘗試使用rep爲每個行中的單詞重複Class值,但幾乎沒有成功。在for循環中完成所有這一切將很容易,但我會很多而是使用矢量化方法(如lapply)來執行此操作。

out.df <- data.frame("RRN" = NULL, "OpenTime" = NULL, 
       "Word" = NULL, "Frequency" = NULL) 

回答

0

對於未來的人來說,我能夠將大部分解決方案向量化爲我的問題。不幸的是,我仍然在尋找使用lapply而不是for循環的方法,但是這正是我所想要的:

# split each row in the corpus column on spaces 
splits <- strsplit(as.character(df$Corpus), split = " ") 

# count the number of times each word in a row appears in that row 
counts <- lapply(splits, table) 

# melt that table to make things more palatable 
counts.melted <- lapply(counts, melt) 

# the result data frame to which we'll append our results 
out.df <- data.frame("Class" = c(), "OpenTime" = c(), 
        "Word" = c(), "Frequency" = c()) 

# it would be better to vectorize this, using something like lapply 
for(idx in 1:length(counts.melted)){ 

    # coerce the melted table at that index to a data frame 
    count.df <- as.data.frame(counts.melted[idx]) 

    # change the column names 
    names(count.df) <- c("Word", "Frequency") 

    # repeat the Classand time for that row to fill in those column 
    count.df[, 'Class'] <- rep(as.character(df[idx, "Class"]), nrow(count.df)) 
    count.df[, 'OpenTime'] <- rep(as.character(df[idx, "OpenTime"]), nrow(count.df)) 

    # append the results 
    out.df <- rbind(out.df, count.df) 
}