與R中

包含多個數值的列交叉表我想知道有多少低，中，高甲戲我有，又有多少低，中，高的犯罪我有我的數據幀。與R中

這裏是我的數據幀的樣本：

       genres class_rentabilite 
         Crime, Drama   Medium 
    Action, Crime, Drama, Thriller   High  
Action, Adventure, Sci-Fi, Thriller   Medium 
           Drama   Low 
         Crime, Drama   High 
         Comedy, Drama   high

我用table()在我的數據的另一個列，它的工作：

table(df$language, df$class_rentabilite)

上面的代碼給出了這樣的：

   Low Medium High NA 
        1  1  0 3 
    Aboriginal  0  0  2 0 
    Arabic   0  0  1 3 
    Aramaic   1  0  0 0 
    Bosnian   1  0  0 0 
    Cantonese  5  2  1 3

我想用這種方法對樣本數據，但table()不工作，因爲genres中的每一行都有多個值。我怎樣才能解決這種情況？

來源

2016-12-12 Y.P

這裏是給你一個方法。你有separate_rows()分裂流派，並創建一個臨時的數據幀。然後，像你一樣使用table()。

library(dplyr) 
library(tidyr) 

mydf %>% 
separate_rows(genres, sep = ", ") -> foo 

table(foo$genres, foo$class_rentabilite) 

#   High Low Medium 
# Action  1 0  1 
# Adventure 0 0  1 
# Comedy  1 0  0 
# Crime  2 0  1 
# Drama  3 1  1 
# Sci-Fi  0 0  1 
# Thriller  1 0  1

DATA

mydf <- structure(list(genres = c("Crime, Drama", "Action, Crime, Drama, Thriller", 
"Action, Adventure, Sci-Fi, Thriller", "Drama", "Crime, Drama", 
"Comedy, Drama"), class_rentabilite = c("Medium", "High", "Medium", 
"Low", "High", "High")), .Names = c("genres", "class_rentabilite" 
), row.names = c(NA, -6L), class = "data.frame")

來源

2016-12-12 01:01:28 jazzurro

非常感謝你。即使像Sci和Fi這樣的一些錯誤分裂成不同的組，這也有很大的幫助。 –

@ Y.P我修改了代碼。我認爲這是你想要的。 :) – jazzurro

好用'separate_rows' – akrun

回答

相關問題