2016-08-04 27 views
3

我們已經閱讀了關於調度的TensorFlow的論文。它可能會預先執行Graph並找到放置操作的「正確」設備。Can TensorFlow可以自動將操作安排到所有可用的GPU?

但是我們已經測試過使用tf.Session(config=tf.ConfigProto(log_device_placement=True))而沒有指定任何設備來運行。我們發現所有的操作都放在第一個GPU中。

日誌看起來像這樣。

Adam/epsilon: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Adam/epsilon: /job:localhost/replica:0/task:0/gpu:0 
Adam/beta2: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Adam/beta2: /job:localhost/replica:0/task:0/gpu:0 
Adam/beta1: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Adam/beta1: /job:localhost/replica:0/task:0/gpu:0 
Adam/learning_rate: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Adam/learning_rate: /job:localhost/replica:0/task:0/gpu:0 
Variable_3/Adam_1: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam_1: /job:localhost/replica:0/task:0/gpu:0 
Variable_3/Adam_1/read: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam_1/read: /job:localhost/replica:0/task:0/gpu:0 
Variable_3/Adam_1/Assign: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam_1/Assign: /job:localhost/replica:0/task:0/gpu:0 
Variable_3/Adam: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam: /job:localhost/replica:0/task:0/gpu:0 
Variable_3/Adam/read: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam/read: /job:localhost/replica:0/task:0/gpu:0 
Variable_3/Adam/Assign: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam/Assign: /job:localhost/replica:0/task:0/gpu:0 
Variable_2/Adam_1: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam_1: /job:localhost/replica:0/task:0/gpu:0 
Variable_2/Adam_1/read: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam_1/read: /job:localhost/replica:0/task:0/gpu:0 
Variable_2/Adam_1/Assign: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam_1/Assign: /job:localhost/replica:0/task:0/gpu:0 
Variable_2/Adam: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam: /job:localhost/replica:0/task:0/gpu:0 
Variable_2/Adam/read: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam/read: /job:localhost/replica:0/task:0/gpu:0 
Variable_2/Adam/Assign: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam/Assign: /job:localhost/replica:0/task:0/gpu:0 
Variable_1/Adam_1: /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_1/Adam_1: /job:localhost/replica:0/task:0/gpu:0 

Variable也放置在GPU中。我保證調度器不夠好,用戶的最佳做法是我們應該指定使用CPU或GPU的操作,特別是當我們有多個GPU時。是對的嗎?

回答

4

從v0.9開始,TensorFlow會將所有操作放在您擁有的第一個GPU上。所以你所觀察的是100%的預期。現在如果您的問題是「TensorFlow能否自動將我的圖形分佈在我的4個GPU上,而無需我的干預?」,截至2016年8月的答案是否定的。

如果您試圖利用本地機器上可用的所有GPU的功能,請查看此variation of the cifar10 tutorial。下一個級別將是replicated training with distributed tensorflow,但這可能是你想要做的事情的矯枉過正。在最近的所有虛擬化進程中,某個操作分配給哪個設備的問題很快就可能無關緊要。

+1

太好了。感謝您的詳細解釋。我們將使用'CUDA_VISIBLE_DEVICES'和'tf.device()'來靈活地放置它們。 – tobe

相關問題