2015-02-08 34 views
5

我有個人級別的數據,我試圖根據組動態總結結果。按data.table中的組繪圖

例子:

set.seed(12039) 
DT <- data.table(id = rep(1:100, each = 50), 
       grp = rep(letters[1:4], each = 1250), 
       time = rep(1:50, 100), 
       outcome = rnorm(5000)) 

我想知道繪製組級別摘要最簡單的方式,數據載於:

DT[ , mean(outcome), by = .(grp, time)] 

我想是這樣的:

​​

但這根本不起作用。

我上倖存的可行選項(可以很容易地循環)爲:

plot(DT[grp == "a", mean(outcome), by = time]) 
lines(DT[grp == "b", mean(outcome), by = time]) 
lines(DT[grp == "c", mean(outcome), by = time]) 
lines(DT[grp == "d", mean(outcome), by = time]) 

(與顏色等添加的參數,排除了簡潔)

這令我不做到這一點的最好方法 - 給予data.table在處理羣體方面的技巧,是不是有更優雅的解決方案?

其他來源已經指向我matplot,但我不能看到一個簡單的方法來使用它 - 我需要重塑DT,並有一個簡單的reshape,將完成這項工作?

回答

4

基地ř使用matplotdcast溶液

dt_agg <- dt[ , .(mean = mean(outcome)), by=.(grp,time)] 
dt_cast <- dcast(dt_agg, time~grp, value.var="mean") 
dt_cast[ , matplot(time, .SD[ , !"time", with=FALSE], 
        type="l", ylab="mean", xlab="")] 
#or, if you've got the data.table version 1.9.7+: 
# (see https://github.com/Rdatatable/data.table/wiki/Installation) 
dt_cast[ , matplot(time, .SD, type="l", ylab="mean", xlab=""), .SDcols = !"time"] 

結果: enter image description here

+2

這個工作,但'dt_cast [,setdiff(名稱(dt_cast), 「時間」),其中= F]'或'dt_cast [ ,當有多個組時,需要使用等級(dt $ grp),其中= F]'。謝謝! – MichaelChirico 2015-02-09 12:41:17

+0

實際上,最近更新到'data.table'就更容易了! – MichaelChirico 2016-10-05 03:21:08

0

使用reshape2您可以將數據集轉換成能方式:

new_dt <- dcast(dt,time~grp,value.var='outcome',fun.aggregate=mean) 

new_dt_molten <- melt(new_dt,id.vars='time') 

,然後用GGPLOT2這樣的情節是:

ggplot(new_dt_molten,aes(x=time,y=value,colour=variable)) + geom_line() 

或者,(簡單的解決方案實際上),你可以使用數據集,您可以直接執行類似操作:

ggplot(dt,aes(x=time,y=outcome,colour=grp)) + geom_jitter() + geom_smooth(method='loess') 

ggplot(dt,aes(x=time,y=outcome,colour=grp)) + geom_smooth(method='loess') 
4

你是非常正確的軌道。使用ggplot來做到這一點如下:

(dt_agg <- dt[,.(mean = mean(outcome)),by=list(grp,time)]) # Aggregated data.table 
    grp time  mean 
    1: a 1 0.75865672 
    2: a 2 0.07244879 
--- 

現在ggplot這個聚合的數據。表

require(ggplot2) 
ggplot(dt_agg, aes(x = time, y = mean, col = grp)) + geom_line() 

結果: enter image description here

4

有一種方法用做此data.tableby參數,如下所示:

DT[ , mean(outcome), by = .(grp, time) 
    ][ , {plot(NULL, xlim = range(time), 
      ylim = range(V1)); .SD} 
     ][ , lines(time, V1, col = .GRP), by = grp] 

注意的是,中間部分{...; .SD}需要繼續鏈接。如果DT[ , mean(outcome), by = .(grp, time)]已經保存爲另一種data.tableDT_m,那麼我們可以只執行:

DT_m[ , plot(NULL, xlim = range(time), ylim = range(V1))] 
DT_m[ , lines(time, V1, col = .GRP), by = grp] 

隨着輸出

data.table group by

很多發燒友的結果是可能的;例如,如果我們想爲每個組指定特定的顏色:

grp_col <- c(a = "blue", b = "black", 
      c = "darkgreen", d = "red") 
DT[ , mean(outcome), by = .(grp, time) 
    ][ , {plot(NULL, xlim = range(time), 
      ylim = range(V1)); .SD} 
     ][ , lines(time, V1, col = grp_col[.BY$grp]), by = grp] 

注:

有在RStudio一個錯誤,這將導致該代碼失敗如果輸出發送到RStudio圖形設備。因爲這種方法只能從命令行上的R或將輸出發送到外部設備(我將它發送到png以產生上述內容)。

參見data.table issue #1524this RStudio support ticket,並且這些SO適量(12