2016-12-16 171 views
4

我在Windows 8和Python 3.5上使用TensorFlow。我改變了this的簡短例子,看看如果GPU支持(Titan X))。不幸的是,運行時(tf.device("/gpu:0")和沒有(tf.device("/cpu:0"))使用GPU是相同的。 Windows CPU監視顯示,在這兩種情況下,計算過程中的CPU負載約爲100%。TensorFlow似乎不使用GPU

這是代碼例如:

import numpy as np 
import tensorflow as tf 
import datetime 

#num of multiplications to perform 
n = 100 

# Create random large matrix 
matrix_size = 1e3 
A = np.random.rand(matrix_size, matrix_size).astype('float32') 
B = np.random.rand(matrix_size, matrix_size).astype('float32') 

# Creates a graph to store results 
c1 = [] 

# Define matrix power 
def matpow(M, n): 
    if n < 1: #Abstract cases where n < 1 
     return M 
    else: 
     return tf.matmul(M, matpow(M, n-1)) 

with tf.device("/gpu:0"): 
    a = tf.constant(A) 
    b = tf.constant(B) 
    #compute A^n and B^n and store results in c1 
    c1.append(matpow(a, n)) 
    c1.append(matpow(b, n)) 

    sum = tf.add_n(c1) 

t1 = datetime.datetime.now() 
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess: 
    # Runs the op. 
    sess.run(sum) 
t2 = datetime.datetime.now() 

print("computation time: " + str(t2-t1)) 

這裏是用於GPU情況下的輸出:

I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_80.dll locally 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.dll locally 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80.dll locally 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_80.dll locally 
C:/Users/schlichting/.spyder-py3/temp.py:16: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future 
    A = np.random.rand(matrix_size, matrix_size).astype('float32') 
C:/Users/schlichting/.spyder-py3/temp.py:17: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future 
    B = np.random.rand(matrix_size, matrix_size).astype('float32') 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX TITAN X 
major: 5 minor: 2 memoryClockRate (GHz) 1.076 
pciBusID 0000:01:00.0 
Total memory: 12.00GiB 
Free memory: 2.40GiB 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0) 
D c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\direct_session.cc:255] Device mapping: 
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0 

Ievice mapping: 
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0 

C:0/task:0/gpu:0 
host/replica:0/task:0/gpu:0 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_108: (MatMul)/job:localhost/replica:0/task:0/gpu:0 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_109: (MatMul)/job:localhost/replica:0/task:0/gpu:0 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_110: (MatMul)/job:localhost/replicacalhost/replica:0/task:0/gpu:0 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_107: (MatMul)/job:localgpu:0 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_103: (MatMul)/job:localhost/replica:0/task:0/gpu:0 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_104: (MatMul)/job:localhost/replica:0/task:0/gpu:0 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_105: (MatMul)/job:localhost/replica:0/task:0/gpu:0 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_106: (MatMul)/job:lo c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] Const_1: (Const)/job:localhost/replica:0/task:0/gpu:0 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_100: (MatMul)/job:localhost/replica:0/task:0/gpu:0 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_101: (MatMul)/job:localhost/replica:0/task:0/gpu:0 
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:827] MatMul_102: (MatMul)/job:localhost/replica:0/task:0/Ionst_1: (Const): /job:localhost/replica:0/task:0/gpu:0 


MatMul_100: (MatMul): /job:localhost/replica:0/task:0/gpu:0 
MatMul_101: (MatMul): /job:localhost/replica:0/task:0/gpu:0 
... 
MatMul_198: (MatMul): /job:localhost/replica:0/task:0/gpu:0 
MatMul_199: (MatMul): /job:localhost/replica:0/task:0/gpu:0 
Const: (Const): /job:localhost/replica:0/task:0/gpu:0 
MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0 
MatMul_1: (MatMul): /job:localhost/replica:0/task:0/gpu:0 
MatMul_2: (MatMul): /job:localhost/replica:0/task:0/gpu:0 
MatMul_3: (MatMul): /job:localhost/replica:0/task:0/gpu:0 
... 
MatMul_98: (MatMul): /job:localhost/replica:0/task:0/gpu:0 
MatMul_99: (MatMul): /job:localhost/replica:0/task:0/gpu:0 
AddN: (AddN): /job:localhost/replica:0/task:0/gpu:0 
computation time: 0:00:05.066000 

在CPU的情況下,輸出是相同的,與CPU:0而不是gpu:0。計算時間不會改變。我甚至使用更多的操作,例如運行時間約爲1分鐘,GPU和CPU相等。 非常感謝提前!

回答

2

根據日誌信息,特別是設備位置,您的代碼使用 GPU。只是跑步的時間是一樣的。我的猜測是:

c1.append(matpow(a, n)) 
c1.append(matpow(b, n)) 

是代碼中的瓶頸,將巨大的矩陣從GPU內存移到內存中。你可以嘗試:

  • 變化的矩陣大小1e4 x 1e4

  • with tf.device("/gpu:0"): 
        A = tf.random_normal([matrix_size, matrix_size]) 
        B = tf.random_normal([matrix_size, matrix_size]) 
        C = tf.matmul(A, B) 
    with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess: 
        t1 = datetime.datetime.now() 
        sess.run(C) 
        t2 = datetime.datetime.now() 
    
+0

謝謝!現在它起作用了,GPU比CPU快大約20倍。 – user3641158

+0

看起來像矩陣創建tensorflow方法,這裏tf.random_normal(),必須使用,而不是用np.random.rand()定義numpy矩陣。 – user3641158

1

例如說創建tensorflow會議採取4.9秒和實際計算僅在CPU捐贈花費0.1你在cpu上的時間爲5.0秒。現在說在gpu上創建會話也需要4.9秒,但計算需要0.01秒,時間爲4.91秒。你幾乎看不出有什麼不同。創建會話是程序啓動時的一次性管理費用。你不應該在你的時間中包括這一點。當您第一次調用sess.run時,tensorflow會進行一些編譯/優化,這會使第一次運行更慢。

嘗試像這樣計時。

with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess: 
    # Runs the op the first time. 
    sess.run(sum) 
    t1 = datetime.datetime.now() 
    for i in range(1000): 
     sess.run(sum) 
    t2 = datetime.datetime.now() 

如果這不能解決它,它可能也是你的計算不允許足夠的並行性讓GPU真正擊敗CPU。增加矩陣大小可能會帶來差異。