測試GPU與tensorflow矩陣乘法

由於很多機器學習算法依賴於矩陣乘法（或至少可以使用矩陣乘法來實現）來測試我的GPU是我計劃建立矩陣A，B，它們相乘並記錄花費的時間爲了計算完成。測試GPU與tensorflow矩陣乘法

這裏是代碼，會生成尺寸300000,20000的兩個矩陣和繁殖他們：

import tensorflow as tf 
import numpy as np 

init = tf.global_variables_initializer() 
sess = tf.Session() 
sess.run(init) 


#a = np.array([[1, 2, 3], [4, 5, 6]]) 
#b = np.array([1, 2, 3]) 

a = np.random.rand(300000,20000) 
b = np.random.rand(300000,20000) 

println("Init complete"); 

result = tf.mul(a , b) 
v = sess.run(result) 

print(v)

這是一個足夠的測試來比較GPU的性能？我應該考慮哪些其他因素？

來源

2017-01-23 blue-sky

這是matmul基準的example，它避免了常見的缺陷，並且與Titan X Pascal上的官方11 TFLOP標記相匹配。

import os 
import sys 
os.environ["CUDA_VISIBLE_DEVICES"]="1" 
import tensorflow as tf 
import time 

n = 8192 
dtype = tf.float32 
with tf.device("/gpu:0"): 
    matrix1 = tf.Variable(tf.ones((n, n), dtype=dtype)) 
    matrix2 = tf.Variable(tf.ones((n, n), dtype=dtype)) 
    product = tf.matmul(matrix1, matrix2) 


# avoid optimizing away redundant nodes 
config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0))) 
sess = tf.Session(config=config) 

sess.run(tf.global_variables_initializer()) 
iters = 10 

# pre-warming 
sess.run(product.op) 

start = time.time() 
for i in range(iters): 
    sess.run(product.op) 
end = time.time() 
ops = n**3 + (n-1)*n**2 # n^2*(n-1) additions, n^3 multiplications 
elapsed = (end - start) 
rate = iters*ops/elapsed/10**9 
print('\n %d x %d matmul took: %.2f sec, %.2f G ops/sec' % (n, n, 
                  elapsed/iters, 
                  rate,))

來源

2017-01-23 16:06:56

很酷，我認爲應該將您的代碼發佈到您的答案中，除了引用代碼外。未發現除非'os.environ [ 「CUDA_VISIBLE_DEVICES」] = –

GPU 「1」'被註釋。與Windows 10的作品，tensorflow-GPU（1.4），cuda_8.0.61_win10和cudnn-8.0-windows10-64-V6.0。 – BSalita

錯誤是'不能分配操作「Variable_1」的設備：操作被明確指定爲/設備：GPU：0，但可用的設備[/職業：本地主機/副本：0 /任務：0 /設備：CPU：0]。確保設備規格指的是有效device.' – BSalita

測試GPU與tensorflow矩陣乘法

回答

相關問題