2014-07-23 68 views
3

我希望計算數據子集的一列的平均值,並將該平均值輸入到整個數據的新列中。r表示整個數據的數據更新列的子集

下面是一些代碼,以使事情更清晰的希望:

t <- data.table(Label=c(0,1,0,1,1,1), x=c("aa","aa","aa","aa","bb","bb"), environment=c("train","train","test","test","train","test")) 
t 
    Label x environment 
1:  0 aa  train 
2:  1 aa  train 
3:  0 aa  test 
4:  1 aa  test 
5:  1 bb  train 
6:  1 bb  test 
setkey(t,x) 
t[environment=="train",avg := mean(Label),by=c("x")] 

t 
    Label x environment avg 
1:  0 aa  train 0.5 
2:  1 aa  train 0.5 
3:  0 aa  test NA 
4:  1 aa  test NA 
5:  1 bb  train 1.0 
6:  1 bb  test NA 

以上,除了它的作品不更新行的代碼,其中環境==「測試」,因爲我做的平均值這是正常的不包括那些子集。

所以我想保留子集上的平均值,但更新所有行的平均值列,包括「測試」列。

所以結果應該是:

t 
    Label x environment avg 
1:  0 aa  train 0.5 
2:  1 aa  train 0.5 
3:  0 aa  test 0.5 # average calculated with train rows only 
4:  1 aa  test 0.5 # average calculated with train rows only 
5:  1 bb  train 1.0 
6:  1 bb  test 1.0 # average calculated with train rows only 

回答

5

好像這是你以後有什麼

t[environment == "train", avg := mean(Label), by = x][, avg := mean(avg, na.rm = T), by= x] 
t 

## Label x environment avg 
## 1:  0 aa  train 0.5 
## 2:  1 aa  train 0.5 
## 3:  0 aa  test 0.5 
## 4:  1 aa  test 0.5 
## 5:  1 bb  train 1.0 
## 6:  1 bb  test 1.0 
+0

或只是'[,AVG:平均= [1],通過= X]'對於第二部分 – eddi

+1

@eddie,你將如何確保'train'始終高於'test'不排序? –

+0

好點,我沒有想到這一點 – eddi

1

你也許可以解決只用data.table這個問題,但對我來說最快捷,最方便的方法得到想要的答案是使用na.locf function from zoo

require(data.table) 
require(zoo) 
t <- data.table(Label=c(0,1,0,1,1,1), x=c("aa","aa","aa","aa","bb","bb"), environment=c("train","train","test","test","train","test")) 

t[environment=="train",avg := mean(Label),by=c("x")] 
t[,avg:=na.locf(avg),by=c("x")] 

只是爲了表明它的工作原理,我添加了一個額外的亂序測試用例,標籤值爲5(使得用組分隔的手段大不相同)。這是我得到的輸出。

t <- data.table(Label=c(0,1,0,1,1,1,5), x=c("aa","aa","aa","aa","bb","bb","aa"), environment=c("train","train","test","test","train","test","test")) 

t[environment=="train",avg := mean(Label),by=c("x")] 
t[,avg:=na.locf(avg),by=c("x")] 
t 
    Label x environment avg 
1:  0 aa  train 0.5 
2:  1 aa  train 0.5 
3:  0 aa  test 0.5 
4:  1 aa  test 0.5 
5:  1 bb  train 1.0 
6:  1 bb  test 1.0 
7:  5 aa  test 0.5