使用另一個data.table來設置data.table

我有dt和dt1data.table s。使用另一個data.table來設置data.table

dt<-data.table(id=c(rep(2, 3), rep(4, 2)), year=c(2005:2007, 2005:2006), event=c(1,0,0,0,1)) 
dt1<-data.table(id=rep(2, 5), year=c(2005:2009), performance=(1000:1004)) 

dt 

    id year event 
1: 2 2005  1 
2: 2 2006  0 
3: 2 2007  0 
4: 4 2005  0 
5: 4 2006  1 

dt1 

    id year performance 
1: 2 2005  1000 
2: 2 2006  1001 
3: 2 2007  1002 
4: 2 2008  1003 
5: 2 2009  1004

我想使用也出現在dt1其第一和第二列的組合，子集前者。由於這個原因，我想創建一個新對象而不會覆蓋dt。這是我想要獲得的。

id year event 
1: 2 2005  1 
2: 2 2006  0 
3: 2 2007  0

我試圖做到這一點使用下面的代碼：

dt.sub<-dt[dt[,c(1:2)] %in% dt1[,c(1:2)],]

，但沒有奏效。結果，我收回了一張與dt相同的數據表。我認爲我的代碼中至少有兩個錯誤。首先，我可能使用錯誤的方法按列排序data.table。第二種，很明顯，%in%適用於向量，而不適用於多列對象。無論如何，我無法找到一個更有效的方式來做到這一點...

在此先感謝您的幫助！

來源

2013-12-19 Riccardo

setkeyv(dt,c('id','year')) 
setkeyv(dt1,c('id','year')) 
dt[dt1,nomatch=0]

輸出 -

> dt[dt1,nomatch=0] 
    id year event performance 
1: 2 2005  1  1000 
2: 2 2006  0  1001 
3: 2 2007  0  1002

來源

2013-12-19 15:47:18 TheComeOnMan

非常感謝！很可能這個在更大的data.table中速度更快。 – Riccardo

如果你不想'performance'列，那麼'dt [dt1，list（event），nomatch = 0L]'應該稍微快一點...... – Arun

'data.table'提供自己的'merge'方法，它沿着這些方向工作。我希望速度是相似的。 – James

使用merge：

merge(dt,dt1, by=c("year","id")) 
    year id event performance 
1: 2005 2  1  1000 
2: 2006 2  0  1001 
3: 2007 2  0  1002

來源

2013-12-19 15:46:35 James

OMG，有時它是那麼容易，你不能看到它...謝謝的解決方案！ – Riccardo

使用另一個data.table來設置data.table

回答

相關問題