2016-10-11 45 views
1

我有一個帶有「正數」(1)或「負數」(0)數據點的大數據幀。計算向量/行與數據幀匹配的次數

數據例如

my_data <- data.frame(cell = 1:4, marker_a = c(1, 0, 0, 0), 
    marker_b = c(0,1,1,1), marker_c = c(0,1,1,0), marker_d = c(0,1,0,1)) 


    cell marker_a marker_b marker_c marker_d 
1 1  1  0  0  0 
2 2  0  1  1  1 
3 3  0  1  1  0 
4 4  0  1  0  1 
... 

我有不同的data.frame具有積極和消極標記所有可能的組合任何my_data$cell可以

combinations_df <- expand.grid(
    marker_a = c(0, 1), 
    marker_b = c(0, 1), 
    marker_c = c(0, 1), 
    marker_d = c(0, 1) 
) 

    marker_a marker_b marker_c marker_d 
1   0  0  0  0 
2   1  0  0  0 
3   0  1  0  0 
4   1  1  0  0 
5   0  0  1  0 
6   1  0  1  0 
7   0  1  1  0 
8   1  1  1  0 
9   0  0  0  1 
10  1  0  0  1 
11  0  1  0  1 
12  1  1  0  1 
13  0  0  1  1 
14  1  0  1  1 
15  0  1  1  1 
16  1  1  1  1 

我怎樣才能得到一個data.frame每行/組合匹配my_data的每一行並返回每個組合的最終計數

考試預期輸出的PLE:

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 
1 14969 15223 15300 14779 14844 16049 15374 15648 15045 15517 15116 15405 14990 15347 14432 15569 
+1

請問您是否根據您已經顯示的例子更新您的預期輸出? –

回答

1

我猜data.table方法是相當有效:

library(data.table) 
setDT(my_data) 

my_data[ combinations_df, on = names(combinations_df), .N, by = .EACHI ] 


    marker_a marker_b marker_c marker_d N 
1:  0  0  0  0 0 
2:  1  0  0  0 1 
3:  0  1  0  0 0 
4:  1  1  0  0 0 
5:  0  0  1  0 0 
6:  1  0  1  0 0 
7:  0  1  1  0 1 
8:  1  1  1  0 0 
9:  0  0  0  1 0 
10:  1  0  0  1 0 
11:  0  1  0  1 1 
12:  1  1  0  1 0 
13:  0  0  1  1 0 
14:  1  0  1  1 0 
15:  0  1  1  1 1 
16:  1  1  1  1 0 

如果你只關心關於數據中顯示的組合,「鏈」過濾命令:

my_data[ combinations_df, on = names(combinations_df), .N, by = .EACHI ][ N > 0 ] 


    marker_a marker_b marker_c marker_d N 
1:  1  0  0  0 1 
2:  0  1  1  0 1 
3:  0  1  0  1 1 
4:  0  1  1  1 1 

或者,在這種情況下,你甚至不需要combinations_df ...

my_data[, .N, by = marker_a:marker_d ] 


    marker_a marker_b marker_c marker_d N 
1:  1  0  0  0 1 
2:  0  1  1  1 1 
3:  0  1  1  0 1 
4:  0  1  0  1 1 
0

也許你可能需要

setNames(sapply(do.call(paste0, combinations_df), 
     function(x) sum(do.call(paste0, my_data[-1])==x)), 1:nrow(combinations_df)) 
1

您在「二進制」寫你的組合,因此無需任何連接,但只是一點數學。試試這個:

setNames(tabulate(as.matrix(my_data[,2:5])%*%2^(0:3)+1,16),1:16) 
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
# 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0