2014-03-27 20 views
2

我試圖使用數據表作爲查找表:連接兩個data.tables的失敗

> (dt <- data.table(myid=rep(11:12,3),zz=1:6,key=c("myid","zz"))) 
    myid zz 
1: 11 1 
2: 11 3 
3: 11 5 
4: 12 2 
5: 12 4 
6: 12 6 
> (id2name <- data.table(id=11:14,name=letters[1:4],key="id")) 
    id name 
1: 11 a 
2: 12 b 
3: 13 c 
4: 14 d 

我要的是

> (res <- data.table(myid=rep(11:12,3),zz=1:6,name=rep(letters[1:2],3),key=c("myid","zz"))) 
    myid zz name 
1: 11 1 a 
2: 11 3 a 
3: 11 5 a 
4: 12 2 b 
5: 12 4 b 
6: 12 6 b 
然而

,連接我試過失敗:

> dt[id2name] 
Starting binary search ...done in 0 secs 
Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x), : 
    Join results in 8 rows; more than 6 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including `j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice. 
Calls: [ -> [.data.table -> vecseq 

我做錯了什麼?

PS。我喜歡用其他方式來獲得結果;什麼是最習慣的方式來做我想做的事(dt必須仍然是data.table,但id2name可以是任何映射int到別的東西 - 只要int不被假定爲向量索引)。

回答

5
> dt[id2name, allow.cartesian=T, nomatch=0] 
    myid zz name 
1: 11 1 a 
2: 11 3 a 
3: 11 5 a 
4: 12 2 b 
5: 12 4 b 
6: 12 6 b 

data.table試圖從你自己保存你已經無意與重複值的鍵連接的情況。請注意,錯誤消息(最終)會告訴您如果確定自己知道自己在做什麼,該怎麼做。

或者:

> id2name[dt] 
    id name zz 
1: 11 a 1 
2: 11 a 3 
3: 11 a 5 
4: 12 b 2 
5: 12 b 4 
6: 12 b 6 
+0

注意,關鍵是在所述第二表中是唯一('I = id2name')所以沒有笛卡兒積是必要 – sds

+0

@sds,我想檢查'data.table'攜帶out只會將連接的結果與內部表的長度進行比較,但實際上並不知道它是否是笛卡爾連接。所以參數名稱並不完美。 – BrodieG

+0

謝謝,我想知道'name'是否可以使用':='插入'dt'? – sds