乘以兩個data.tables，保留所有可能

我的問題是這樣的：

我有兩個data.tables。一列有兩列（featurea，count），另一列有三列（featureb，featurec，count）。我想乘（？），以便我有一個新的data.table所有的可能性。訣竅是這些功能不匹配，因此merge解決方案可能無法解決問題。

MRE如下：

# two columns 
DT1 <- data.table(featurea =c("type1","type2"), count = c(2,3)) 

#  featurea count 
#1: type1  2 
#2: type2  3 

#three columns 
DT2 <- data.table(origin =c("house","park","park"), color =c("red","blue","red"),count =c(2,1,2)) 

# origin color count 
#1: house red  2 
#2: park blue  1 
#3: park red  2

我預期的結果，在這種情況下，是一個data.table如下：

> DT3 
    origin color featurea total 
1: house red type1  4 
2: house red type2  6 
3: park blue type1  2 
4: park blue type2  3 
5: park red type1  4 
6: park red type2  6

來源

2016-12-21 erasmortg

會'DT2 [（featurea = DT1 [ 「featurea」]，計數=計數* DT1 [」 count「]]），by =。（origin，color）]'效率足夠高嗎？ – Roland

@羅蘭似乎是這樣，這聽起來是最好的答案，所以你應該這樣發佈 – Tensibai

這將是一個辦法。首先，我在splitstackshape包中擴大了DT2中的行與expandRows()。自從我指定count = 2, count.is.col = FALSE以來，每行重複兩次。然後，我照顧乘法並創建了一個名爲total的新列。同時，我爲featurea創建了一個新列。最後，我放棄了count。

library(data.table) 
library(splitstackshape) 

expandRows(DT2, count = nrow(DT1), count.is.col = FALSE)[, 
    `:=` (total = count * DT1[, count], featurea = DT1[, featurea])][, count := NULL]

編輯

如果不希望添加其他的包，你可以嘗試大衛在他的評論的想法。

DT2[rep(1:.N, nrow(DT1))][, 
    `:=`(total = count * DT1$count, featurea = DT1$featurea, count = NULL)][] 



# origin color total featurea 
#1: house red  4 type1 
#2: house red  6 type2 
#3: park blue  2 type1 
#4: park blue  3 type2 
#5: park red  4 type1 
#6: park red  6 type2

來源

2016-12-21 12:18:56 jazzurro

@DavidArenburg是的，我同意你的看法。如果OP提供更詳細的示例，則此想法需要修訂。 '諾羅（DT1）'是個好主意。 – jazzurro

@jazzurro更徹底的例子需要什麼？我的數據集比這個大得多，並且沒有相同的列名。我仍然贊成，雖然 – erasmortg

@erasmortg我不是說我需要整個數據集。對困惑感到抱歉。 – jazzurro

隨着dplyr解決方案

library(dplyr) 
library(data.table) 

DT1 <- data.table(featurea =c("type1","type2"), count = c(2,3)) 
DT2 <- data.table(origin =c("house","park","park"), color =c("red","blue","red"),count =c(2,1,2))

創建一個虛擬列內連接上（對我來說它的key）：

inner_join(DT1 %>% mutate(key=1), 
      DT2 %>% mutate(key=1), by="key") %>% 
mutate(total=count.x*count.y) %>% 
select(origin, color, featurea, total) %>% 
arrange(origin, color)

來源

2016-12-21 12:34:18

請測試上更大的數據，我不知道這是如何優化：

DT2[, .(featurea = DT1[["featurea"]], 
     count = count * DT1[["count"]]), by = .(origin, color)] 
# origin color featurea count 
#1: house red type1  4 
#2: house red type2  6 
#3: park blue type1  2 
#4: park blue type2  3 
#5: park red type1  4 
#6: park red type2  6

這可能是更有效的開關使用它，如果DT1少羣：

DT1[, c(DT2[, .(origin, color)], 
     .(count = count * DT2[["count"]])), by = featurea] 
# featurea origin color count 
#1: type1 house red  4 
#2: type1 park blue  2 
#3: type1 park red  4 
#4: type2 house red  6 
#5: type2 park blue  3 
#6: type2 park red  6

來源

2016-12-21 13:35:39 Roland

乘以兩個data.tables，保留所有可能

回答

相關問題