當結果已知是對稱時加速矩陣乘法

我知道矩陣乘法的結果是對稱的。是否有一個R包或一些標準方法，我可以通過只計算下半部/上半部三角形然後將結果複製到另一半來加速我的計算。當結果已知是對稱時加速矩陣乘法

我知道tcrossprod受益於這個事實，當只有一個參數提供，但我想提供兩個矩陣。

這裏就是結果是對稱的一個例子：

n <- 100 
m <- 200 
s<-matrix(runif(n^2),n,n) 
s[lower.tri(s)] <- t(s)[lower.tri(s)] 
x <- matrix(runif(m*n), m, n) 
x %*% s %*% t(x)

tcrossprod似乎並沒有成爲解決方案：

library(microbenchmark) 
microbenchmark(x %*% s %*% t(x), tcrossprod(x %*% s, x))

我試圖使用RCPP，甚至沒有複製一步，這是比R的乘法慢（雖然我坦率地承認我是一個初學者C++/Rcpp用戶）：

w <- s %*% t(x) 
mm = Rcpp::cppFunction(
'NumericMatrix mmult(NumericMatrix m , NumericMatrix v) 
{ 
    NumericMatrix out(m.nrow(), v.ncol()); 

    for (int i = 0; i < m.nrow(); i++) 
    { 
    for (int j = 0; j < i + 1; j++) 
    { 
     for(int k = 0; k < m.ncol(); k++){ 
     out(i,j) += m(i,k) * v(k,j) ; 
     } 
    } 
    } 
    return out; 
}' 
) 

microbenchmark(mm(x, w), x %*% w)

我認爲如果.Internal functiondo_matprod中的sym變量被暴露並且可以被用戶設置爲真，這將被解決。不過，我真的不希望惹這樣的事情......

來源

2017-10-07 Richard Redding

也許在'Matrix'包裏有什麼？它有一個對稱類。或者可能是'matrixStats'包。 – lmo

它似乎並不像matrix包採取andvantage對稱性：

> n <- 100 
> x <- s <- matrix(runif(n^2),n,n) 
> s[lower.tri(s)] <- t(s)[lower.tri(s)] 
> 
> library(Matrix) 
> s_sym <- Matrix(forceSymmetric(s)) 
> class(s_sym) # has the symmetric class 
[1] "dsyMatrix" 
attr(,"package") 
[1] "Matrix" 
> 
> library(microbenchmark) 
> microbenchmark(x %*% x, s %*% s, s_sym %*% s_sym) 
Unit: microseconds 
      expr min lq mean median uq max neval 
     x %*% x 461 496 571 528 625 1008 100 
     s %*% s 461 499 560 532 572 986 100 
s_sym %*% s_sym 553 568 667 624 701 1117 100

沒有任何跡象表明，它應在幫助文件：

基本矩陣產品，%*%實現我們所有的矩陣和也爲sparseVector類，完全類似的r基地矩陣和矢量對象。功能crossprod和tcrossprod是矩陣產品或「交叉產品」，理想情況下有效實施，不需要計算t(.)。當易於檢測時，例如，在crossprod(m), 一個參數情況下，它們也返回分類矩陣。 tcrossprod()取矩陣的轉置矩陣的交叉乘積。 tcrossprod(x)正式相當於，但快，呼籲x %*% t(x)，所以tcrossprod(x, y)而不是 x %*% t(y)。

用於您的解決方案是讓使用Rcpp包裝功能和R_ext/BLAS.h提供的BLAS功能。你可以做到這一點，如下所示：做一個func.cpp像這樣的：

// added to get $(BLAS_LIBS) in compile flags 
//[[Rcpp::depends(RcppArmadillo)]] 
#include <Rcpp.h> 
#include <R_ext/BLAS.h> 

/* 
    Wrapper for BLAS dsymm. See dsymm http://www.netlib.org/lapack/explore-html/d1/d54/group__double__blas__level3_ga253c8edb8b21d1b5b1783725c2a6b692.html#ga253c8edb8b21d1b5b1783725c2a6b692 
    Only works with side = 'R' 
    Note intput is by refernce with & 
*/ 
// [[Rcpp::export]] 
Rcpp::NumericMatrix blas_dsymm(
    char uplo, int m, int n, double alpha, 
    const Rcpp::NumericMatrix &A, const Rcpp::NumericMatrix &B){ 
    // set lda, ldb and ldc 
    int lda = n, ldb = m, ldc = m; 

    // make new matrix with dim(m, n) 
    Rcpp::NumericMatrix C(m, n); // default values are zero 
    double beta = 0; 

    F77_NAME(dsymm)(
    "R" /* side */, &uplo, &m, &n, &alpha, 
    A.begin(), &lda, B.begin(), &ldb, &beta, C.begin(), &ldc); 

    return(C); 
}

然後運行下列R-腳本：

> n <- 100 
> m <- 200 
> s<-matrix(runif(n^2),n,n) 
> s[lower.tri(s)] <- t(s)[lower.tri(s)] 
> x <- matrix(runif(m*n), m, n) 
> 
> library("Rcpp") 
> sourceCpp("func.cpp") 
> 
> out <- x %*% s 
> out_blas <- blas_dsymm(
+ uplo = "U", m = nrow(x), n = ncol(x), 
+ alpha = 1, A = s, B = x) 
> 
> all.equal(out, out_blas) 
[1] TRUE 
> 
> library(microbenchmark) 
> microbenchmark(
+ dense = x %*% s, 
+ BLAS = blas_dsymm(
+  uplo = "U", m = nrow(x), n = ncol(x), 
+  alpha = 1, A = s, B = x)) 
Unit: microseconds 
    expr  min  lq  mean median  uq  max neval 
dense 880.989 950.3225 1114.744 1066.866 1159.311 2783.213 100 
    BLAS 858.866 938.6680 1169.839 1016.495 1225.286 3261.633 100

它似乎並沒有在這裏做任何區別。請注意，您需要安裝RcppArmadillo和Rcpp軟件包。

來源

2017-10-07 21:27:15

感謝您的建議。我在最近的編輯中嘗試過一種純粹的Rcpp解決方案，但沒有多少運氣。在這種情況下，我將如何使用'R_ext/BLAS.h'？ –

檢查我對我的回答所做的修改。 –

顯示如何訪問BLAS的好答案。這將是這裏唯一的希望，但是正如你所表現的，很難爲這個問題擠出額外的表現。 –

-1

不要用for循環重新編碼矩陣乘法。線性代數庫對此進行了高度優化，您可能會慢10倍（或更糟糕）。

對於矩陣計算，您不會通過使用RcppArmadillo或RcppEigen獲得太多（或鬆散）。

如果你想獲得，你可以改變你正在使用的數學庫，例如使用帶有Microsoft R Open的MKL。

來源

2017-10-08 10:17:54

MKL提供了一個更快的BLAS，這是一個標準接口，您可以將任何R實現作爲共享庫構建。微軟的R只是捆綁了MKL，但你可以（根據許可條款）將其添加到其他R版本。最後，（Rcpp）Eigen不使用BLAS，所以答案在技術上是錯誤的，因爲Eigen做它自己的事情。 –

@DirkEddelbuettel你是正確的MKL。我談到了MRO，因爲它是在R中使用MKL的最簡單方法。對於Eigen部分，您也是對的。然而，我從來沒有說過它使用BLAS，我只是說它不比矩陣乘法的基R快（從R 3.3.0開始在3臺計算機上測試過）。 –

當結果已知是對稱時加速矩陣乘法

回答

相關問題