加速R中的相關矩陣計算

我有一個有49個變量和4M行的數據幀。我想計算49 x 49的相關矩陣。所有列都是類數字。加速R中的相關矩陣計算

這裏有一個例子：

df <- data.frame(replicate(49,sample(0:50,4000000,rep=TRUE)))

我使用的標準cor功能。

cor_matrix <- cor(df, use = "pairwise.complete.obs")

這需要很長時間。我有16GB RAM和一個i5單核2.60Ghz。

有沒有辦法讓我的桌面計算速度更快？

來源

2016-03-21 vagabond

您可能會檢查[here]（http://www.r-bloggers.com/bigcor-large-correlation-matrices-in-r/） – akrun

您的主要問題是'use =「pairwise.complete.obs」 '。在我的系統上（用12列進行測試），需要花費5倍於use =「everything」的時間。 – Roland

WGCNA軟件包中有更快版本的cor功能（用於根據相關性推斷基因網絡）。在我的3.1 GHz的酷睿i7瓦特/ 16 GB的RAM它可以解決同樣的49 X 49矩陣大約快20倍：

mat <- replicate(49, as.numeric(sample(0:50,4000000,rep=TRUE))) 

system.time(
    cor_matrix <- cor(mat, use = "pairwise.complete.obs") 
) 
user system elapsed 
40.391 0.017 40.396 

system.time(
    cor_matrix_w <- WGCNA::cor(mat, use = "pairwise.complete.obs") 
) 
user system elapsed 
1.822 0.468 2.290 

all.equal(cor_matrix, cor_matrix_w) 
[1] TRUE

檢查幫助文件在功能上的版本之間的差異的詳細信息時，您的數據中含有較多的缺失意見。

來源

2016-03-21 17:39:33

加速R中的相關矩陣計算

回答

相關問題