2017-02-11 29 views
2

相結合,與各行我也有類似的這些數據除了dt1具有行和dt2具有只有15行(而不是15萬美元)合併兩個data.tables其中DT2所有行DT1

dt1 <- data.table(ID=1:4,City=c("Charlotte","DC","Salem","Boston")) 
dt2 <- data.table(Birds=c("Saker","Peregrine","Barbary","Prarie","Golden","Coopers","Canary","Finch"),BirdType=c("Falcon","Falcon","Falcon","Falcon","Eagle","Hawk","Breakfast","Breakfast")) 

其輸出這樣的:

> dt1 
    ID  City 
1: 1 Charlotte 
2: 2  DC 
3: 3  Salem 
4: 4 Boston 

> dt2 
     Birds BirdType 
1:  Saker Falcon 
2: Peregrine Falcon 
3: Barbary Falcon 
4: Prarie Falcon 
5: Golden  Eagle 
6: Coopers  Hawk 
7: Canary Breakfast 
8:  Finch Breakfast 

我想合併兩個data.tables由此DT1中的每一行與DT2的所有行相結合,最終給出data.table與32行輸出如下:

> dtMerged 
    ID  City Birds  BirdType 
1: 1 Charlotte Saker  Falcon 
2: 1 Charlotte Peregrine Falcon 
3: 1 Charlotte Barbary Falcon 
4: 1 Charlotte Prarie  Falcon 
5: 1 Charlotte Golden  Eagle 
6: 1 Charlotte Coopers Hawk 
7: 1 Charlotte Canary Breakfast 
8: 1 Charlotte Finch Breakfast 
9: 2  DC Saker  Falcon 
10: 2  DC Peregrine Falcon 
11: 2  DC Barbary Falcon 
12: 2  DC Prarie  Falcon 
13: 2  DC Golden  Eagle 
14: 2  DC Coopers Hawk 
15: 2  DC Canary Breakfast 
16: 2  DC Finch Breakfast 
17: 3  Salem Saker  Falcon 
18: 3  Salem Saker  Falcon 
etc... 

任何想法如何最好地完成這將不勝感激。 我在Windows 7 PC上使用data.table版本1.10.4。謝謝。

+2

你可以使用'CJ'做一個交叉連接,例如'CJ(do.call(paste,c(dt1,sep =「,」)),do.call(paste,c(dt2,sep =「,」)) )[,unlist(lapply(.SD,tstrsplit,split =「,」),recursive = FALSE)]' – akrun

+0

Thanks @akrun。交叉連接是要走的路。 – FG7

+2

'dt1 [,as.list(dt2),by = names(dt1)]'似乎也起作用。哦,或者反過來這樣做,因爲dt2在你的真實用例中有很少的行。另外,如果每個鳥的名字都是獨一無二的,你可以保留這兩個較小的表,然後創建一個只包含Bird和ID的新表,並在存儲器上節省一些時間:'CJ(ID = dt1 $ ID,BirdName = dt2 $ Birds)'。然後,您可以根據需要從名稱和姓名中查找城市。 – Frank

回答

1

正如@akrun所評論的,交叉連接似乎是解決問題的方法之一。爲了實現它,我通過@jangorecki CJ.dtthis Stack Overflow post引用的很整潔函數來獲得所需的解決方案:

CJ.dt = function(X,Y) { 
    stopifnot(is.data.table(X),is.data.table(Y)) 
    k = NULL 
    X = X[, c(k=1, .SD)] 
    setkey(X, k) 
    Y = Y[, c(k=1, .SD)] 
    setkey(Y, NULL) 
    X[Y, allow.cartesian=TRUE][, k := NULL][] 
} 

new_df <- CJ.dt(dt1, dt2) 
setorder(new_df, ID) 

下面是完整的輸出是什麼樣子,在重新排序:

> new_df 

ID  City  Birds BirdType 
1: 1 Charlotte  Saker Falcon 
2: 1 Charlotte Peregrine Falcon 
3: 1 Charlotte Barbary Falcon 
4: 1 Charlotte Prarie Falcon 
5: 1 Charlotte Golden  Eagle 
6: 1 Charlotte Coopers  Hawk 
7: 1 Charlotte Canary Breakfast 
8: 1 Charlotte  Finch Breakfast 
9: 2  DC  Saker Falcon 
10: 2  DC Peregrine Falcon 
11: 2  DC Barbary Falcon 
12: 2  DC Prarie Falcon 
13: 2  DC Golden  Eagle 
14: 2  DC Coopers  Hawk 
15: 2  DC Canary Breakfast 
16: 2  DC  Finch Breakfast 
17: 3  Salem  Saker Falcon 
18: 3  Salem Peregrine Falcon 
19: 3  Salem Barbary Falcon 
20: 3  Salem Prarie Falcon 
21: 3  Salem Golden  Eagle 
22: 3  Salem Coopers  Hawk 
23: 3  Salem Canary Breakfast 
24: 3  Salem  Finch Breakfast 
25: 4 Boston  Saker Falcon 
26: 4 Boston Peregrine Falcon 
27: 4 Boston Barbary Falcon 
28: 4 Boston Prarie Falcon 
29: 4 Boston Golden  Eagle 
30: 4 Boston Coopers  Hawk 
31: 4 Boston Canary Breakfast 
32: 4 Boston  Finch Breakfast 
+0

非常感謝@ david-c。這對我的大型數據集完美而快速地工作。 – FG7