2017-12-18 107 views
1

我有一個數據幀尋找這樣的:如何跨越製表(XTABS)多瓦爾,但相同的擊穿

SubjectID Activity  V1   V2   V3 
1   2  S 0.2571778 -0.02328523 -0.01465376 
2   2  W 0.2860267 -0.01316336 -0.11908252 
3   3  R 0.2754848 -0.02605042 -0.11815167 
4   3  W 0.2702982 -0.03261387 -0.11752018 
5   4  A 0.2748330 -0.02784779 -0.12952716 
6   4  S 0.2792199 -0.01862040 -0.11390197 
... 

(其實有更多的Vn的變數,但這說明了這個問題。)

我想用xtabs()看所有Vn的增值經銷商,但保持SubjectID和活動不斷 - 像

xtabs(c(V1, V2, V3) ~ SubjectID + Activity, data = DF) 

lapply(c(V1, V2, V3), function(x) xtabs(x ~ SubjectID + Activity, data = DF)) 

但當然這些不起作用。什麼是正確的方法在這裏?


編輯:我想是的

xtabs(V1 ~ SubjectID + Activty, data = DF) 
xtabs(V2 ~ SubjectID + Activty, data = DF) 
xtabs(V3 ~ SubjectID + Activty, data = DF) 
... 
+1

一種方法是使用'reshape'而不是'xtabs','lappl y(paste0(「V」,1:3),function(x) reshape(df [c(x,「SubjectID」,「Activity」)],idvar =「SubjectID」,timevar =「Activity」 「寬」)) ' –

+0

@RonakShah這是偉大的,除非它沒有總結價值作爲xtabs會(我真的想找到平均值,但如果我能得到它總結我可以外推的意思) – Conrad

回答

1

輸出你應該能夠只使用get提供感興趣的列的特徵向量後。

lapply(c("V1", "V2", "V3"), function(x) xtabs(get(x) ~ SubjectID + Activity, data = DF)) 

與 「airquality」 數據集試試看:

setNames(lapply(names(airquality)[1:4], 
       function(x) xtabs(get(x) ~ Month + Day, airquality)), 
     names(airquality)[1:4]) 

根據您的意見,我建議你看一下使用 「data.table」 和dcast如果荷蘭國際集團你需要一個寬泛的數據集。

下面是一個例子:

set.seed(1) 
DF <- cbind(warpbreaks, V2 = sample(100, nrow(warpbreaks)), V3 = sample(100, nrow(warpbreaks))) 
library(data.table) 
setDT(DF) 
lapply(c("breaks", "V2", "V3"), function(x) { 
    dcast(DF[, lapply(.SD, mean), .(wool, tension)], wool ~ tension, value.var = x) 
}) 
# [[1]] 
# wool  L  M  H 
# 1: A 44.55556 24.00000 24.55556 
# 2: B 28.22222 28.77778 18.77778 
# 
# [[2]] 
# wool  L  M  H 
# 1: A 59.22222 46.33333 33.22222 
# 2: B 49.44444 44.77778 43.22222 
# 
# [[3]] 
# wool L  M  H 
# 1: A 40 68.11111 74.22222 
# 2: B 48 40.11111 37.77778 

或者,你可以有一個完全寬 「data.table」,像這樣:

dcast(DF[, lapply(.SD, mean), .(wool, tension)], wool ~ tension, 
     value.var = c("breaks", "V2", "V3")) 
# wool breaks_L breaks_M breaks_H  V2_L  V2_M  V2_H V3_L  V3_M  V3_H 
# 1: A 44.55556 24.00000 24.55556 59.22222 46.33333 33.22222 40 68.11111 74.22222 
# 2: B 28.22222 28.77778 18.77778 49.44444 44.77778 43.22222 48 40.11111 37.77778 
1

使用整潔的做法,這是怎麼了我會解決這個問題:

library(tidyr) 
library(dplyr) 
library(purrr) 

df <- tribble(
    ~SubjectID, ~Activity,  ~V1,   ~V2,   ~V3, 
      2,  "S", 0.2571778, -0.02328523, -0.01465376, 
      2,  "W", 0.2860267, -0.01316336, -0.11908252, 
      3,  "R", 0.2754848, -0.02605042, -0.11815167, 
      3,  "W", 0.2702982, -0.03261387, -0.11752018, 
      4,  "A", 0.2748330, -0.02784779, -0.12952716, 
      4,  "S", 0.2792199, -0.01862040, -0.11390197 
) 

df %>% 
    select(starts_with("V")) %>% 
    map(~{ 
    as_tibble(xtabs(.x ~ SubjectID + Activity, data = df)) 
    }) %>% 
    bind_rows(.id = "var") %>% 
    spread(Activity, n) 

# # A tibble: 9 x 6 
#  var SubjectID   A   R   S   W 
# * <chr>  <chr>  <dbl>  <dbl>  <dbl>  <dbl> 
# 1 V1   2 0.00000000 0.00000000 0.25717780 0.28602670 
# 2 V1   3 0.00000000 0.27548480 0.00000000 0.27029820 
# 3 V1   4 0.27483300 0.00000000 0.27921990 0.00000000 
# 4 V2   2 0.00000000 0.00000000 -0.02328523 -0.01316336 
# 5 V2   3 0.00000000 -0.02605042 0.00000000 -0.03261387 
# 6 V2   4 -0.02784779 0.00000000 -0.01862040 0.00000000 
# 7 V3   2 0.00000000 0.00000000 -0.01465376 -0.11908252 
# 8 V3   3 0.00000000 -0.11815167 0.00000000 -0.11752018 
# 9 V3   4 -0.12952716 0.00000000 -0.11390197 0.00000000