0
我有一個對稱矩陣,我需要按列列出子集和,並根據列表應用函數並將函數應用於每個子集。我如何加快流程或改進流程?通過列表並應用函數來列表矩陣
我當前的代碼是類似這樣的:
funs <- function(x, y, data) {
if (all(colnames(data) %in% x) & all(colnames(data) %in% y)) {
mean(data[x, y])
} else if (any(colnames(data) %in% x) & any(colnames(data) %in% y)) {
mean(data[colnames(data) %in% x, colnames(data) %in% y])
} else{
NA
}
}
vfuns <- Vectorize(funs, vectorize.args = c("x", "y"))
outer(l, l, vfuns, data = mat)
2 9 10 15 16 18
2 0.2277186 NA NA NA NA NA
9 NA NA NA NA NA NA
10 NA NA NA NA NA NA
15 NA NA NA NA NA NA
16 NA NA NA NA NA NA
18 NA NA NA NA NA NA
在早期版本的我計算每個組合的矩陣,但這種方式最終計算兩次(或更多)的一些比較,是相當緩慢的。通過這種方式,我也計算了兩次比較結果funs("2", "9", data = mat) == funs("9", "2", data = mat)
,但不是更多。我想提高性能的東西:
- 「告訴」外面的結果是對稱的:怎麼樣?
- 將列表轉換爲環境以加快查找速度(
Error: attempt to replicate an object of type 'environment'
) - 並行外部?
- ??
列表:
l <- structure(list(`2` = c("109582", "114608", "140837", "140877",
"1474228", "1474244", "162582", "194315", "194840", "76002",
"76005"), `9` = c("1430728", "156580", "156582", "211859"), `10` = c("1430728",
"156580", "156582", "211859"), `15` = c("1430728", "209776",
"209931", "71291"), `16` = c("379716", "379724", "74160"), `18` = c("112310",
"112315", "112316", "888590", "916853")), .Names = c("2", "9",
"10", "15", "16", "18"))
矩陣:
mat <- structure(c(1, 0.305084745762712, 0.0728051391862955, 0.151950718685832,
0.035778175313059, 0.128755364806867, 0.157080523601745, 0.127659574468085,
0.0452173913043478, 0.591549295774648, 0.32089552238806, 0.305084745762712,
1, 0.102040816326531, 0.186440677966102, 0.0421052631578947,
0.127272727272727, 0.0306691449814126, 0.0232558139534884, 0.00970873786407767,
0.6, 0.970059880239521, 0.0728051391862955, 0.102040816326531,
1, 0.62962962962963, 0.0317460317460317, 0.0225563909774436,
0.00383141762452107, 0.00546448087431694, 0.0140845070422535,
0.0970873786407767, 0.0970873786407767, 0.151950718685832, 0.186440677966102,
0.62962962962963, 1, 0.0273972602739726, 0.041958041958042, 0.00759013282732448,
0.00518134715025907, 0., 0.150442477876106, 0.178861788617886,
0.035778175313059, 0.0421052631578947, 0.0317460317460317, 0.0273972602739726,
1, 0.608938547486033, 0.0284403669724771, 0.0131004366812227,
0.00854700854700855, 0.0402684563758389, 0.041025641025641, 0.128755364806867,
0.127272727272727, 0.0225563909774436, 0.041958041958042, 0.608938547486033,
1, 0.0491379310344828, 0.0133779264214047, 0.0053475935828877,
0.10958904109589, 0.13134328358209, 0.157080523601745, 0.0306691449814126,
0.00383141762452107, 0.00759013282732448, 0.0284403669724771,
0.0491379310344828, 1, 0.288429752066116, 0.11384335154827, 0.111504424778761,
0.0333796940194715, 0.127659574468085, 0.0232558139534884, 0.00546448087431694,
0.00518134715025907, 0.0131004366812227, 0.0133779264214047,
0.288429752066116, 1, 0.527426160337553, 0.0780669144981413,
0.0229885057471264, 0.0452173913043478, 0.00970873786407767,
0.0140845070422535, 0., 0.00854700854700855,
0.0053475935828877, 0.11384335154827, 0.527426160337553, 1, 0.0636942675159236,
0.00947867298578199, 0.591549295774648, 0.6, 0.0970873786407767,
0.150442477876106, 0.0402684563758389, 0.10958904109589, 0.111504424778761,
0.0780669144981413, 0.0636942675159236, 1, 0.625454545454545,
0.32089552238806, 0.970059880239521, 0.0970873786407767, 0.178861788617886,
0.041025641025641, 0.13134328358209, 0.0333796940194715, 0.0229885057471264,
0.00947867298578199, 0.625454545454545, 1), .Dim = c(11L, 11L
), .Dimnames = list(c("109582", "114608", "140837", "140877",
"1474228", "1474244", "162582", "194315", "194840", "76002",
"76005"), c("109582", "114608", "140837", "140877", "1474228",
"1474244", "162582", "194315", "194840", "76002", "76005")))
雖然這肯定是我的問題的錯誤,這個問題本身是關於做交運集團 – Llopis
所有元素的兩兩比較看看編輯的答案在那裏我將延伸到成對比較。你也曾要求改進,我提出你最初編寫'%colnames(mat)'的索引是不正確的,應該是'%x'中的'colnames(mat)%,並且if語句不是必需的,並且您只需使用funs < - 函數(x,y,數據)平均值(%x中的數據[colnames(數據)%,%y])'中的NaN而不是NA得到相似的結果。 – Djork
外部已經做了我想要的,如何使用兩個嵌套sapply調用更快更好? – Llopis