我希望計算數據子集的一列的平均值,並將該平均值輸入到整個數據的新列中。r表示整個數據的數據更新列的子集
下面是一些代碼,以使事情更清晰的希望:
t <- data.table(Label=c(0,1,0,1,1,1), x=c("aa","aa","aa","aa","bb","bb"), environment=c("train","train","test","test","train","test"))
t
Label x environment
1: 0 aa train
2: 1 aa train
3: 0 aa test
4: 1 aa test
5: 1 bb train
6: 1 bb test
setkey(t,x)
t[environment=="train",avg := mean(Label),by=c("x")]
t
Label x environment avg
1: 0 aa train 0.5
2: 1 aa train 0.5
3: 0 aa test NA
4: 1 aa test NA
5: 1 bb train 1.0
6: 1 bb test NA
以上,除了它的作品不更新行的代碼,其中環境==「測試」,因爲我做的平均值這是正常的不包括那些子集。
所以我想保留子集上的平均值,但更新所有行的平均值列,包括「測試」列。
所以結果應該是:
t
Label x environment avg
1: 0 aa train 0.5
2: 1 aa train 0.5
3: 0 aa test 0.5 # average calculated with train rows only
4: 1 aa test 0.5 # average calculated with train rows only
5: 1 bb train 1.0
6: 1 bb test 1.0 # average calculated with train rows only
或只是'[,AVG:平均= [1],通過= X]'對於第二部分 – eddi
@eddie,你將如何確保'train'始終高於'test'不排序? –
好點,我沒有想到這一點 – eddi