與RcppArmadillo的矩陣乘法：爲什麼它不是更快？

我想用RcppArmadillo做一些矩陣乘法運算。但是，我的代碼顯示它不會因RcppArmadillo變得更快。與RcppArmadillo的矩陣乘法：爲什麼它不是更快？

我使用Windows_10_Pro其中R 3.2.4，和RcppArmadillo 0.6.600.4.0

例如：

library(RcppArmadillo) 
library(inline) 

MCplus <- cxxfunction(signature(X_="numeric", Y_="numeric"),body =' 
arma::mat X = Rcpp::as<arma::mat>(X_); 
arma::mat Y = Rcpp::as<arma::mat>(Y_); 
arma::mat ans = X * Y * X; 
return(wrap(ans)); 
', plugin="RcppArmadillo") 

A <- matrix(1:16000000,4000,4000) 
C <- matrix(2:16000001,4000,4000) 

R_M <- proc.time() 
ans_R <- A%*%C%*%A # test with R 
proc.time() - R_M 

C_M <- proc.time() 
ans_C <- MCplus(A,C) # test with RcppArmadillo 
proc.time() - C_M

的R輸出端：

user system elapsed 
106.75 0.24 106.98

而RcppArmadillo輸出：

user system elapsed 
108.28 0.23 108.56

有什麼可以改進的嗎？

在此先感謝！

來源

2016-03-16 gaofangshu

我使用'microbenchmark'的速度更快我的平均R爲7.36秒，RcppArmadillo爲5.67秒。 – Raad

這裏的另一個問題是，這涉及到數據的兩個副本：一個是構建Armadillo矩陣時，另一個是從相乘結果創建R對象時。您可以通過使用Armadillo的高級構造函數來避免此成本：有關更多信息，請參閱http://arma.sourceforge.net/docs.html#Mat。 –

好點，我們現在可以通過Rcpp Attributes免費獲得這些。 –

R本身將這種情況解析爲LAPACK/BLAS，鏈接到R的代碼也通過LAPACK/BLAS進行調用。所以是的，這兩種方法都會運行相同的代碼，差異只是由於小的開銷。

有很多教程告訴你如何改變你的LAPACK庫。這當然取決於操作系統。開始也許用R Installation and Administration手冊及其附錄。

來源

2016-03-16 16:55:58

非常感謝！這是否意味着如果我更改LAPACK庫可能會更快，並且新庫的兩種方法之間沒有太大的區別？ – gaofangshu

是的。例如，革命分析公司從來沒有停止過談論他們的R如何「更快」，當他們（基本上）他們捆綁了（已知好的）英特爾MKL多線程BLAS時。你也可以獲得這些，或者使用OpenBLAS，或者...谷歌周圍，這是廣泛討論的話題。 –

我會試試看。非常感謝您的幫助:) – gaofangshu

與RcppArmadillo的矩陣乘法：爲什麼它不是更快？

回答

相關問題