Tensorflow：MultiGPU培訓上的圖形

所有代碼都假設Tensorflow 1.3和Python 3.x的Tensorflow：MultiGPU培訓上的圖形

我們正在研究它有一個有趣的損失函數的GaN算法的部分。

Stage 1 - Compute only the completion/generator loss portion of the network 
      Iterates over the completion portion of the GAN for X iterations. 

Stage 2 - Compute only the discriminator loss portion of the network 
      Iterates over the discriminator portion for Y iterations (but 
      don't train on Stage 1) 

Stage 3 - Compute the full loss on the network 
      Iterate over both completion and discriminator for Z iterations 
      (training on the entire network).

我們有這個工作單GPU。由於培訓時間長，我們希望使其能夠支持多GPU。

我們已經看過Tensorflow/models/tutorials/Images/cifar10/cifar10_multi_gpu_train.py，它討論塔架損失，將塔架平均到一起，在GPU上計算您的梯度，然後將它們應用到CPU上。這是一個很好的開始。但是，由於我們的損失比較複雜，所以對我們來說一切都會複雜化。代碼是非常複雜的，但大致類似於這個，https://github.com/timsainb/Tensorflow-MultiGPU-VAE-GAN，（但不會運行，因爲它是圍繞Tensorflow 0.1編寫的，所以它有一些古怪，我沒有得到工作，但應該給你的，我們正在做的事情的想法）

當我們計算梯度，它看起來是這樣的（僞代碼，以儘量突出的重要部分）：

for i in range(num_gpus): 
    with tf.device('/gpu:%d' % gpus[i]): 
     with tf.name_scope('Tower_%d' % gpus[i]) as scope: 
      with tf.variable_scope("generator") 
       generator = build_generator() 

     with tf.variable_scope("discriminator"): 
      with tf.variable_scope("real_discriminator") : 
       real_discriminator = build_discriminator(x) 

      with tf.variable_scope("fake_discriminator", reuse = True): 
       fake_discriminator = build_discriminator(generator) 

     gen_only_loss, discm_only_loss, full_loss = build_loss(generator, 
      real_discriminator, fake_discriminator) 

     tf.get_variable_scope().reuse_variables() 

     gen_only_grads = gen_only_opt.compute_gradients(gen_only_loss) 
     tower_gen_only_grads.append(gen_only_grads) 

     discm_only_train_vars= tf.get_collection( 
      tf.GraphKeys.TRAINABLE_VARIABLES, "discriminator") 
     discm_only_train_vars= discm_only_train_vars+ tf.get_collection( 
      tf.GraphKeys.TRAINABLE_RESOURCE_VARIABLES, "discriminator") 

     discm_only_grads = discm_only_opt.compute_gradients(discm_only_loss, 
      var_list = discm_only_train_vars) 
     tower_discm_only_grads.append(discm_only_grads) 

     full_grads = full_opt.compute_gradients(full_loss) 
     tower_full_grads.append(full_grads) 

# average_gradients is the same code from the cifar10_multi_gpu_train.py. 
We haven't changed it. Just iterates over gradients and averages 
them...this is part of the problem... 
gen_only_grads = average_gradients(tower_gen_only_grads) 
gen_only_train = gen_only_opt.apply_gradients(gen_only_grads, 
global_step=global_step) 

discm_only_grads = average_gradients(tower_discm_only_grads) 
discm_only_train = discm_only_opt.apply_gradients(discm_only_grads, 
    global_step=global_step) 

full_grads = average_gradients(tower_full_grads) 
full_train = full_opt.apply_gradients(full_grads, global_step=global_step)

如果我們所說的只是「compute_gradients（ full_loss）「，該算法在多個GPU上正常工作。這相當於cifar10_multi_gpu_train.py示例中的代碼。棘手的部分來自需要限制階段1或階段2的網絡。

Compute_gradients（full_loss），具有默認值爲None的var_list參數，這意味着它會訓練所有變量。它如何知道在Tower_1中不訓練Tower_0變量？我問，因爲當我們處理compute_gradients（discm_only_loss，var_list = discm_only_train_vars）時，我需要知道如何收集正確的變量來限制訓練到網絡的那一部分。我發現一條線索在談論這件事，但發現它不準確/不完整 - "freeze" some variables/scopes in tensorflow: stop_gradient vs passing variables to minimize。

原因是，如果您查看compute_gradients中的代碼，那麼當傳入None時，var_list將被填充爲可訓練變量和可訓練資源變量的組合。所以這就是我如何限制它的原因。如果我們不嘗試分割多個GPU，這一切都可以正常工作。

問題1：現在我已經把塔網拆分了，我是否也負責收集當前的塔？我需要添加一條這樣的線嗎？（？，並確保我不會錯過那些變量的培訓）

discm_only_train_vars= tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, "Tower_{}/discriminator".format(i)) 
discm_only_train_vars= discm_only_train_vars + tf.get_collection(tf.GraphKeys.TRAINABLE_RESOURCE_VARIABLES, "Tower_{}/discriminator".format(i))

爲了培養塔式適當的變量

問題2：可能是相同的答案的問題1.獲得「compute_gradients （gen_only_loss）「有點難...在non-tarered版本中，gen_only_loss從來沒有觸及鑑別器，所以它激活了它需要的圖中的張量，一切都很好。然而，在聳立的版本中，當我調用「compute_gradients」時，它返回尚未激活的張量的漸變 - 所以一些條目是[（None，tf.Variable），（None，tf.Variable）]。這會導致average_gradients崩潰，因爲它無法將None值轉換爲張量。這讓我覺得我需要限制這些。

所有這些令人困惑的事情是，cifar的例子和我的full_loss例子並不關心在特定塔上的訓練，但是我猜測一旦我指定了一個var_list，compute_gradients正在使用的任何魔法知道哪個訓練哪些塔消失的變量？我需要擔心抓住任何其他變量嗎？

來源

2017-09-27 SpaceCowboy850

對於問題1，如果你手動分割，你有責任收集，是的。

對於問題2，您可能想限制調用compute_gradients或篩選結果。

來源

2017-10-02 19:58:14

Tensorflow：MultiGPU培訓上的圖形

回答

相關問題