TensorFlow：一個網絡，兩個GPU？

我有兩個不同的輸出流的卷積神經網絡：TensorFlow：一個網絡，兩個GPU？

      input 
          | 
         (...) <-- several convolutional layers 
          | 
         _________ 
    (several layers) |  | (several layers) 
    fully-connected |  | fully-connected 
    output stream 1 -> |  | <- output stream 2

我想計算上/gpu:1上/gpu:0流1和流2。不幸的是我無法正確設置它。

此嘗試：

...placeholders... 
...conv layers... 

with tf.device("/gpu:0"): 
    ...stream 1 layers... 
    nn_out_1 = tf.matmul(...) 

with tf.device("/gpu:1"): 
    ...stream 2 layers... 
    nn_out_2 = tf.matmul(...)

運行死慢（比僅基於1個GPU訓練慢），有時會產生在輸出NaN值。我想這可能是因爲with語句可能無法正確同步。所以我加了control_dependencies並置於CONV層上/gpu:0明確：

...placeholders... # x -> input, y -> labels 

with tf.device("/gpu:0"): 
    with tf.control_dependencies([x, y]): 
     ...conv layers... 
     h_conv_flat = tf.reshape(h_conv_last, ...) 

with tf.device("/gpu:0"): 
    with tf.control_dependencies([h_conv_flat]): 
     ...stream 1 layers... 
     nn_out_1 = tf.matmul(...) 

with tf.device("/gpu:1"): 
    with tf.control_dependencies([h_conv_flat]): 
     ...stream 2 layers... 
     nn_out_2 = tf.matmul(...)

...但這種方法的網絡甚至沒有運行。不管是什麼我已經試過了，抱怨沒有被初始化的輸入：

tensorflow.python.framework.errors.InvalidArgumentError: 
    You must feed a value for placeholder tensor 'x' 
    with dtype float 
    [[Node: x = Placeholder[dtype=DT_FLOAT, shape=[], 
    _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

沒有with報表網絡訓練只/gpu:0和運行良好 - 火車合理的東西，沒有任何錯誤。

我在做什麼錯？ TensorFlow是否無法將一個網絡中的不同層流分成不同的GPU？我是否總是必須拆分完成網絡在不同塔？

來源

2016-03-06 daniel451

它可以依靠從許多不同的因素。是相同的gpus？你的數據有多大？ – fabrizioM

是的，這兩個GPU是相同的，它們在一張卡上。這是一張來自NVIDIA [雙核] K80 Tesla卡[http://www.nvidia.com/object/tesla-k80.html]。它具有24 GB VRAM，並且數據完全適合一個GPU（12GB）的VRAM。 – daniel451

您確定該計算的瓶頸是GPU速度嗎？ GPU帶寬*的瓶頸是非常常見的，而不是實際的計算;如果你發送一個很大的張量到另一個GPU上，那麼在那種情況下它只會讓事情變得更糟。 – Peteris

有一個如何在一個網絡上使用許多gpus的例子 https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py 可能你可以複製代碼。也可以得到這樣的

# Creates a graph. 
c = [] 
for d in ['/gpu:2', '/gpu:3']: 
with tf.device(d): 
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3]) 
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2]) 
    c.append(tf.matmul(a, b)) 
with tf.device('/cpu:0'): 
sum = tf.add_n(c) 
# Creates a session with log_device_placement set to True. 
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) 
# Runs the op. 
print sess.run(sum)

綜觀：https://www.tensorflow.org/versions/r0.7/how_tos/using_gpu/index.html#using-multiple-gpus

問候

來源

2016-03-06 07:44:12

TensorFlow：一個網絡，兩個GPU？

回答

相關問題