2017-07-02 94 views
0

我使用我複製粘貼從here了一些小改動的程序。這是我的代碼以試圖提高訓練速度:在tensorflow深MNIST例如使用GPU VS CPU

from tensorflow.examples.tutorials.mnist import input_data 
mnist = input_data.read_data_sets('MNIST_data', one_hot=True) 

import tensorflow as tf 

x = tf.placeholder(tf.float32, shape=[None, 784]) 
y_ = tf.placeholder(tf.float32, shape=[None, 10]) 
W = tf.Variable(tf.zeros([784,10])) 
b = tf.Variable(tf.zeros([10])) 

def weight_variable(shape): 
    initial = tf.truncated_normal(shape, stddev=0.1) 
    return tf.Variable(initial) 

def bias_variable(shape): 
    initial = tf.constant(0.1, shape=shape) 
    return tf.Variable(initial) 

def conv2d(x, W): 
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') 

def max_pool_2x2(x): 
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], 
         strides=[1, 2, 2, 1], padding='SAME') 

with tf.device('/gpu:0'): 
    W_conv1 = weight_variable([5, 5, 1, 32]) 
    b_conv1 = bias_variable([32]) 
    x_image = tf.reshape(x, [-1, 28, 28, 1]) 
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) 
    h_pool1 = max_pool_2x2(h_conv1) 

    W_conv2 = weight_variable([5, 5, 32, 64]) 
    b_conv2 = bias_variable([64]) 

    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) 
    h_pool2 = max_pool_2x2(h_conv2) 

    W_fc1 = weight_variable([7 * 7 * 64, 1024]) 
    b_fc1 = bias_variable([1024]) 

    h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) 
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) 

    keep_prob = tf.placeholder(tf.float32) 
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) 

    W_fc2 = weight_variable([1024, 10]) 
    b_fc2 = bias_variable([10]) 

    y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2 

    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv)) 
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) 
    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1)) 
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 

    with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)) as sess: 
    sess.run(tf.global_variables_initializer()) 
    for i in range(20000): 
     batch = mnist.train.next_batch(50) 
     if i % 100 == 0: 
     train_accuracy = accuracy.eval(feed_dict={ 
      x: batch[0], y_: batch[1], keep_prob: 1.0}) 
     print('step %d, training accuracy %g' % (i, train_accuracy)) 
     train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}) 

    print('test accuracy %g' % accuracy.eval(feed_dict={ 
     x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})) 

將會產生以下的輸出:

Extracting MNIST_data/train-images-idx3-ubyte.gz 
Extracting MNIST_data/train-labels-idx1-ubyte.gz 
Extracting MNIST_data/t10k-images-idx3-ubyte.gz 
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz 
step 0, training accuracy 0.22 
step 100, training accuracy 0.76 
step 200, training accuracy 0.88 
... 

的問題是,在本教程中所採取的原始代碼的時間(即沒有with tf.device('/ gpu:0'):在第26行)並且這段代碼沒有可測量的差別(每個步驟大約10秒)。我成功安裝了cuda-8.0和cuDNN(經過數小時的失敗嘗試)。 「$ NVIDIA-SMI」返回以下輸出:

Sun Jul 2 13:57:10 2017  
+-----------------------------------------------------------------------------+ 
| NVIDIA-SMI 375.26     Driver Version: 375.26     | 
|-------------------------------+----------------------+----------------------+ 
| GPU Name  Persistence-M| Bus-Id  Disp.A | Volatile Uncorr. ECC | 
| Fan Temp Perf Pwr:Usage/Cap|   Memory-Usage | GPU-Util Compute M. | 
|===============================+======================+======================| 
| 0 GeForce GT 710  Off | 0000:01:00.0  N/A |     N/A | 
| N/A 49C P0 N/A/N/A | 406MiB/2000MiB |  N/A  Default | 
+-------------------------------+----------------------+----------------------+ 


+-----------------------------------------------------------------------------+ 
| Processes:              GPU Memory | 
| GPU  PID Type Process name        Usage  | 
|=============================================================================| 
| 0     Not Supported           | 
+-----------------------------------------------------------------------------+ 

所以,問題是:

1)是工作過小,產生在選擇CPU或GPU沒有區別? 2)或者在我的實現中有一些愚蠢的錯誤?

感謝您閱讀整個問題。

+1

它只是意味着GPU默認情況下使用時可用。您應該明確地使用CPU來測量差異。 – user1735003

+0

謝謝@ user1735003。我嘗試了你的建議(用cpu替換gpu)。結果是每一步都要延長5秒。它應該更快,對嗎?另外,當我從網站複製粘貼原始代碼並將其與上述代碼進行比較時,沒有可觀察到的差異。你能告訴我爲什麼嗎? – Roofi

回答

0

沒有任何錯誤提示,TensorFlow絕對可以用GPU上運行,你可以運行此代碼的事實。這裏的問題是,當你按原樣運行TensorFlow時,默認情況下它會嘗試在GPU上運行。有幾種方法可以強制它在CPU上運行。

  1. 以此方式運行:CUDA_VISIBLE_DEVICES= python code.py。請注意,當你這樣做,仍然有with tf.device('/gpu:0'),它會中斷,所以刪除它。
  2. 變化評論

    with tf.device('/gpu:0')with tf.device('/cpu:0')

編輯從問題的更多信息,什麼allow_soft_placementlog_device_placement意味着ConfigProto見here

+0

對不起不夠明確@jkschin但確實,即使我不提'配置= tf.ConfigProto(allow_soft_placement =真,log_device_placement = TRUE)語句「TensorFlow的是,默認情況下,它會嘗試在GPU上運行的」抱真'在會議的括號內。 – Roofi

+0

這些參數不影響它是否在GPU上運行。參見[這裏](https://stackoverflow.com/questions/44873273/what-do-the-options-in-configproto-like-allow-soft-placement-and-log-device-plac/44873274#44873274)爲更多信息。 – jkschin

+0

請在回答中添加您的最新評論(適用於未來的googlers)@jkschin – Roofi