2016-10-10 71 views
0

垂直格式我有這些值的數據幀:水平,以R中

X1 X2  X3 
s1 45.11 1 
s1 45.13 1 
s1 53.42 2 
s1 51.41 2 
s2 96.76 3 
s2 96.65 3 
s4 77.9 4 
s1 80.46 5 
s3 43.58 2 
s1 43.12 2 
s1 41.51 3 
s4 41.97 3 
s1 108.97 6 
s3 117.46 6 
s4 40  3 
s4 40  3 
s5 25.4 1 
s5 25.5 1 

我想將其轉換爲數據幀在這樣的格式:

s1  s2 s3 s4 s5 
1 45.12 0 0 0 25.45 
2 49.32 0 43.58 0 0 

在此,值是上述匹配標準的列的平均值,即,是行s1的一部分並且具有值X3爲1.

如何在R中實現這一點?

+1

可能的重複[聚合和重塑從長到寬](http:// stackoverflow。 com/q/23611735/903061),儘管這裏可能會有更好的一個。 – Gregor

+0

我不明白你想要的輸出是什麼。 – snoram

+0

抱歉,忘了上面的'value.var'參數,'reshape2 :: dcast(X3〜X1,data = df,fun.aggregate = mean,value.var =「X2」)'應該這樣做。 – Gregor

回答

3

你可以在基礎R做到這一點(假設你的數據在數據幀df):

r <- aggregate(X2~X1+X3, df[df$X3 %in% c(1,2),], mean) 
round(t(xtabs(X2~X1+X3, r)), 2) 

# X1 
#X3  s1 s2 s3 s4 s5 
# 1 45.12 0.00 0.00 0.00 25.45 
# 2 49.32 0.00 43.58 0.00 0.00 
1

使用data.table

setDT(df) 
df.mean <- df[, mean(X2), by = .(X1, X3)] 
df.mean.wide <- dcast(df.mean, X3 ~ X1, value.var = "V1") 
df.mean.wide[is.na(df.mean.wide)] <- 0 
df.mean.wide[1:2] 

    X3  s1 s2 s3 s4 s5 
1: 1 45.12000 0 0.00 0 25.45 
2: 2 49.31667 0 43.58 0 0.00 
1

或者您可以使用較新的tidyrdplyr包。以下示例旨在分解兩個步驟(#1總結您的數據;#2轉換爲寬格式):

library(dyplr) 
library(tidyr) 

# fake example data set 
data_frame(
    X1 = rep(paste0("S", 1:5), times = 6), 
    X2 = c(1:30) * 0.1, 
    X3 = rep(1:10, each = 3) 
) %>% 
    # summarize to calculate mean for each X1 & X3 group 
    group_by(X1, X3) %>% 
    summarize(X2.avg = mean(X2)) %>% 
    # spread into wide format with 0s for all missing combinations 
    spread(X1, X2.avg, fill = 0) %>% 
    # if you really only want to look at the first two X3s 
    filter(X3 < 3)