通過列表並應用函數來列表矩陣

我有一個對稱矩陣，我需要按列列出子集和，並根據列表應用函數並將函數應用於每個子集。我如何加快流程或改進流程？通過列表並應用函數來列表矩陣

我當前的代碼是類似這樣的：

funs <- function(x, y, data) { 
    if (all(colnames(data) %in% x) & all(colnames(data) %in% y)) { 
     mean(data[x, y]) 
    } else if (any(colnames(data) %in% x) & any(colnames(data) %in% y)) { 
     mean(data[colnames(data) %in% x, colnames(data) %in% y]) 
    } else{ 
     NA 
    } 
} 

vfuns <- Vectorize(funs, vectorize.args = c("x", "y")) 

outer(l, l, vfuns, data = mat) 
      2 9 10 15 16 18 
2 0.2277186 NA NA NA NA NA 
9   NA NA NA NA NA NA 
10  NA NA NA NA NA NA 
15  NA NA NA NA NA NA 
16  NA NA NA NA NA NA 
18  NA NA NA NA NA NA

在早期版本的我計算每個組合的矩陣，但這種方式最終計算兩次（或更多）的一些比較，是相當緩慢的。通過這種方式，我也計算了兩次比較結果funs("2", "9", data = mat) == funs("9", "2", data = mat)，但不是更多。我想提高性能的東西：

「告訴」外面的結果是對稱的：怎麼樣？
將列表轉換爲環境以加快查找速度（Error: attempt to replicate an object of type 'environment'）
並行外部？
??

列表：

l <- structure(list(`2` = c("109582", "114608", "140837", "140877", 
"1474228", "1474244", "162582", "194315", "194840", "76002", 
"76005"), `9` = c("1430728", "156580", "156582", "211859"), `10` = c("1430728", 
"156580", "156582", "211859"), `15` = c("1430728", "209776", 
"209931", "71291"), `16` = c("379716", "379724", "74160"), `18` = c("112310", 
"112315", "112316", "888590", "916853")), .Names = c("2", "9", 
"10", "15", "16", "18"))

矩陣：

mat <- structure(c(1, 0.305084745762712, 0.0728051391862955, 0.151950718685832, 
0.035778175313059, 0.128755364806867, 0.157080523601745, 0.127659574468085, 
0.0452173913043478, 0.591549295774648, 0.32089552238806, 0.305084745762712, 
1, 0.102040816326531, 0.186440677966102, 0.0421052631578947, 
0.127272727272727, 0.0306691449814126, 0.0232558139534884, 0.00970873786407767, 
0.6, 0.970059880239521, 0.0728051391862955, 0.102040816326531, 
1, 0.62962962962963, 0.0317460317460317, 0.0225563909774436, 
0.00383141762452107, 0.00546448087431694, 0.0140845070422535, 
0.0970873786407767, 0.0970873786407767, 0.151950718685832, 0.186440677966102, 
0.62962962962963, 1, 0.0273972602739726, 0.041958041958042, 0.00759013282732448, 
0.00518134715025907, 0., 0.150442477876106, 0.178861788617886, 
0.035778175313059, 0.0421052631578947, 0.0317460317460317, 0.0273972602739726, 
1, 0.608938547486033, 0.0284403669724771, 0.0131004366812227, 
0.00854700854700855, 0.0402684563758389, 0.041025641025641, 0.128755364806867, 
0.127272727272727, 0.0225563909774436, 0.041958041958042, 0.608938547486033, 
1, 0.0491379310344828, 0.0133779264214047, 0.0053475935828877, 
0.10958904109589, 0.13134328358209, 0.157080523601745, 0.0306691449814126, 
0.00383141762452107, 0.00759013282732448, 0.0284403669724771, 
0.0491379310344828, 1, 0.288429752066116, 0.11384335154827, 0.111504424778761, 
0.0333796940194715, 0.127659574468085, 0.0232558139534884, 0.00546448087431694, 
0.00518134715025907, 0.0131004366812227, 0.0133779264214047, 
0.288429752066116, 1, 0.527426160337553, 0.0780669144981413, 
0.0229885057471264, 0.0452173913043478, 0.00970873786407767, 
0.0140845070422535, 0., 0.00854700854700855, 
0.0053475935828877, 0.11384335154827, 0.527426160337553, 1, 0.0636942675159236, 
0.00947867298578199, 0.591549295774648, 0.6, 0.0970873786407767, 
0.150442477876106, 0.0402684563758389, 0.10958904109589, 0.111504424778761, 
0.0780669144981413, 0.0636942675159236, 1, 0.625454545454545, 
0.32089552238806, 0.970059880239521, 0.0970873786407767, 0.178861788617886, 
0.041025641025641, 0.13134328358209, 0.0333796940194715, 0.0229885057471264, 
0.00947867298578199, 0.625454545454545, 1), .Dim = c(11L, 11L 
), .Dimnames = list(c("109582", "114608", "140837", "140877", 
"1474228", "1474244", "162582", "194315", "194840", "76002", 
"76005"), c("109582", "114608", "140837", "140877", "1474228", 
"1474244", "162582", "194315", "194840", "76002", "76005")))

來源

2017-02-27 Llopis

-1

也許我誤解你的問題，但我認爲你有x %in% colnames(mat)相反，你會想colnames(mat) %in% x

x <- l[[1]][c(1,3,5)] # x is the 1st, 3rd, 5th entry 
x %in% colnames(mat) 
# [1] TRUE TRUE TRUE # returns a vector length 3 
# index mat by x %in% colnames(mat) returns the full matrix as c(TRUE,TRUE,TRUE) is simply repeated upto dim of mat 
mat[x %in% colnames(mat), x %in% colnames(mat)] 

colnames(mat) %in% x 
# [1] TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE 
# returns TRUE only for 1st, 3rd, 5th element which is what we want 
mat[colnames(mat) %in% x, colnames(mat) %in% x] # 3 x 3 matrix

現在使用colnames(mat) %in% x您不需要funs函數中的if語句，mat[colnames(mat) %in% x, colnames(mat) %in% x]將確保只返回真值。

# compare 
x <- l[[1]] # all cases 
mat[colnames(mat) %in% x, colnames(mat) %in% x] 
x <- l[[1]][c(1,3,5)] # any cases 
mat[colnames(mat) %in% x, colnames(mat) %in% x] 
x <- c(l[[1]][c(1,3,5)], "2", "22", "222") # any cases 
mat[colnames(mat) %in% x, colnames(mat) %in% x] 
x <- c("2", "22", "222") # none 
mat[colnames(mat) %in% x, colnames(mat) %in% x] # empty matrix

現在你可以僅僅是在原地用sapply或您funs功能找到矩陣子

sapply(l, function(x) mean(mat[colnames(mat) %in% x, colnames(mat) %in% x])) 
sapply(l, function(x) mean(mat[colnames(mat) %in% x, colnames(mat) %in% x], na.rm=TRUE)) # also consider na.rm parameter if needed

輸出產生NaN的空矩陣的均值的平均值，但你完全可以替代所有NaN與NA之後。

編輯延伸到所請求的兩兩比較

# matrix form 
sapply(l, function(x) sapply(l, function (y) mean(mat[colnames(mat) %in% x, colnames(mat) %in% y]))) 

# list form 
lapply(l, function(x) sapply(l, function (y) mean(mat[colnames(mat) %in% x, colnames(mat) %in% y])))

來源

2017-02-27 11:42:25 Djork

雖然這肯定是我的問題的錯誤，這個問題本身是關於做交運集團 – Llopis

所有元素的兩兩比較看看編輯的答案在那裏我將延伸到成對比較。你也曾要求改進，我提出你最初編寫'％colnames（mat）'的索引是不正確的，應該是'％x'中的'colnames（mat）％，並且if語句不是必需的，並且您只需使用funs < - 函數（x，y，數據）平均值（％x中的數據[colnames（數據）％，％y]）'中的NaN而不是NA得到相似的結果。 – Djork

外部已經做了我想要的，如何使用兩個嵌套sapply調用更快更好？ – Llopis

通過列表並應用函數來列表矩陣

回答

相關問題