GPU上的Tensorflow matmul計算比CPU上的計算速度慢

我正在嘗試第一次使用GPU計算，當然希望大幅加速。但是在張量流的一個基本例子中，它實際上更糟糕：GPU上的Tensorflow matmul計算比CPU上的計算速度慢

在cpu：0上，10次運行平均每次平均需要2秒，gpu：0需要2.7秒，而gpu：1比cpu差50％ 0與3秒。

下面的代碼：

import tensorflow as tf 
import numpy as np 
import time 
import random 

for _ in range(10): 
    with tf.Session() as sess: 
     start = time.time() 
     with tf.device('/gpu:0'): # swap for 'cpu:0' or whatever 
      a = tf.constant([random.random() for _ in xrange(1000 *1000)], shape=[1000, 1000], name='a') 
      b = tf.constant([random.random() for _ in xrange(1000 *1000)], shape=[1000, 1000], name='b') 
      c = tf.matmul(a, b) 
      d = tf.matmul(a, c) 
      e = tf.matmul(a, d) 
      f = tf.matmul(a, e) 
      for _ in range(1000): 
       sess.run(f) 
     end = time.time() 
     print(end - start)

什麼我觀察這裏？運行時間可能主要是通過在RAM和GPU之間複製數據來控制的？

來源

2016-11-22 stefan

嘗試增加矩陣並查看'nvidia-smi'中的gpu用法與'top'中的cpu用法。 – sygi

@sygi謝謝，我不知道'nvidia-smi'。它顯示GPU-Util不會超過2％。儘管python似乎佔用了大部分內存。功耗在40W/180W時相當穩定 – stefan

因此，您所編寫的代碼看起來並不是gpu-bound。你可以嘗試將'a'和'b'改成'tf.random_uniform（[1000，1000]）'？就內存而言，TF默認採用所有GPU內存（噁心！），但是有一個選項可以通過強制動態分配。 – sygi

您用於生成數據的方式在CPU上執行（random.random()是一個常規的python函數，而不是TF-one）。另外，在一次運行中執行10^6次將比請求10^6隨機數慢。代碼更改爲：

a = tf.random_uniform([1000, 1000], name='a') 
b = tf.random_uniform([1000, 1000], name='b')

使得數據將在GPU上並行地生成並沒有時間將被浪費將其從RAM傳送到GPU。

來源

2016-11-22 10:08:07 sygi

GPU上的Tensorflow matmul計算比CPU上的計算速度慢

回答

相關問題