2

作爲代碼所示,次通過螺紋的增加,減少和通過螺紋的減小而增加,最佳爲1。多線程增加矩陣乘法示例中的時間?

import java.util.concurrent.ExecutorService; 
import java.util.concurrent.Executors; 
import java.util.concurrent.TimeUnit; 

public class MatrixMultiplication { 
    public static class Matrix { 
     int[][] container; 
     int rows; 
     int cols; 

     public Matrix(int rows, int cols) { 
      this.rows = rows; 
      this.cols = cols; 
      container = new int[rows][cols]; 
     } 
    } 

    public static class MultiplyThread implements Runnable { 
     private int row; 
     private int col; 
     Matrix a; 
     Matrix b; 
     Matrix c; 

     public MultiplyThread(Matrix a, Matrix b, Matrix c, int row, int col) { 
      this.row = row; 
      this.col = col; 
      this.a = a; 
      this.b = b; 
      this.c = c; 
     } 

     @Override 
     public void run() { 
      int sum = 0; 
      for (int i = 0; i < a.cols; i++) // or b.rows 
       sum += a.container[row][i] * b.container[i][col]; 
      c.container[row][col] = sum; 
     } 
    } 

    public static Matrix multiplyMatrices(Matrix a, Matrix b, int threads) throws InterruptedException { 
     if (a.cols != b.rows) 
      return null; 
     int rows = a.rows; 
     int cols = b.cols; 
     Matrix ret = new Matrix(rows, cols); 
     ExecutorService executor = Executors.newFixedThreadPool(threads); 
     int i = 0, j = 0; 
     for (i = 0; i < rows; i++) 
      for (j = 0; j < cols; j++) 
       executor.submit(new MultiplyThread(a, b, ret, i, j)); 
     executor.shutdown(); 
     executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS); 
     return ret; 
    } 

    public static void main(String[] args) throws InterruptedException { 
     int[][] mat = new int[][] { { 1, 1, 1, 1 }, { 1, 1, 1, 1 }, { 1, 1, 1, 1 }, { 1, 1, 1, 1 } }; 
     Matrix a = new Matrix(4, 4); 
     a.container = mat; 
     Matrix b = new Matrix(4, 4); 
     b.container = mat; 

     int numProcessors = Runtime.getRuntime().availableProcessors(); 
     System.out.println(numProcessors); 
     for (int i = 10; i >= 1; i--) { 
      long time = System.currentTimeMillis(); 
      for (int j = 0; j < 10000; j++) 
       multiplyMatrices(a, b, i); 
      time = System.currentTimeMillis() - time; 
      System.out.println("number of threads: " + i + ", and the time: " + time); 
     } 
    } 

} 

作爲代碼所示,次數增加通過螺紋的增加,減少和由線程數減少,最佳值爲1.

+4

a)http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java; b)這些是小矩陣 - 啓動額外線程的開銷將大大超過任何潛在的加速。 –

+0

查看我使用Akka進行的大型矩陣並行化的示例:https://github.com/GreaterMKEMeetup/akka-examples我使用BigInteger可以將大型矩陣乘以大數。使用線程池的速度非常快。 – aglassman

回答

1

創建線程在註釋中指出的開銷有一個問題。

此外,當多個CPU嘗試修改每個CPU的高速緩存行上表示的內存時,該高速緩存行的第一個修改會導致其他CPU中的高速緩存行被標記爲無效。這在SMP系統中造成了真正的性能問題。這個問題被稱爲虛假分享。

當您修改多個CPU的地址空間中靠近的內存區域時,您很可能會發生虛假共享。對於出色的寫作情況,see Dr. Dobbs

您的代碼似乎可能會遭受虛假分享,因爲您符合一般標準。您正在修改數據,數據在內存中緊密相連,並且您正在多線程上執行此操作。如果你也有多個CPU核心,那麼幾乎可以肯定至少有一定程度的錯誤分享。