評估Tensorflow操作在循環中非常緩慢

我試圖通過編碼一些簡單的問題來學習tensorflow：我試圖找到使用直接採樣蒙特卡羅方法的pi的值。評估Tensorflow操作在循環中非常緩慢

運行時間比我認爲使用for loop這樣做的時間要長得多。我見過其他類似的東西，我試着按照解決方案，但我認爲我仍然必須做錯的事情。

附在下面是我的代碼：

import tensorflow as tf 
import numpy as np 
import time 

n_trials = 50000 

tf.reset_default_graph() 


x = tf.random_uniform(shape=(), name='x') 
y = tf.random_uniform(shape=(), name='y') 
r = tf.sqrt(x**2 + y**2) 

hit = tf.Variable(0, name='hit') 

# perform the monte carlo step 
is_inside = tf.cast(tf.less(r, 1), tf.int32) 
hit_op = hit.assign_add(is_inside) 

with tf.Session() as sess: 
    init_op = tf.global_variables_initializer() 
    sess.run(init_op) 

    # Make sure no new nodes are added to the graph 
    sess.graph.finalize() 

    start = time.time() 

    # Run monte carlo trials -- This is very slow 
    for _ in range(n_trials): 
     sess.run(hit_op) 

    hits = hit.eval() 
    print("Pi is {}".format(4*hits/n_trials)) 
    print("Tensorflow operation took {:.2f} s".format((time.time()-start))) 

>>> Pi is 3.15208 
>>> Tensorflow operation took 8.98 s

相比較而言，在做numpy的一個for loop類型的解決方案是一個數量級的速度更快

start = time.time() 
hits = [ 1 if np.sqrt(np.sum(np.square(np.random.uniform(size=2)))) < 1 else 0 for _ in range(n_trials) ] 
a = 0 
for hit in hits: 
    a+=hit 
print("numpy operation took {:.2f} s".format((time.time()-start))) 
print("Pi is {}".format(4*a/n_trials)) 

>>> Pi is 3.14032 
>>> numpy operation took 0.75 s

附在下面是整體的差異的曲線圖執行不同次數的試驗。

請注意：我的問題不是關於「如何執行這一任務最快的」，我承認有計算圓周率的更有效的方法。我只用它作爲基準測試工具來檢查tensorflow對我熟悉的東西（numpy）的性能。

來源

2017-03-17 Ben

速度緩慢已經得到了在sess.run Python和Tensorflow之間的一些通信開銷，這是執行你的循環內多次做。我建議使用tf.while_loop在Tensorflow中執行計算。這將是一個更好的比較numpy。

import tensorflow as tf 
import numpy as np 
import time 

n_trials = 50000 

tf.reset_default_graph() 

hit = tf.Variable(0, name='hit') 

def body(ctr): 
    x = tf.random_uniform(shape=[2], name='x') 
    r = tf.sqrt(tf.reduce_sum(tf.square(x)) 
    is_inside = tf.cond(tf.less(r,1), lambda: tf.constant(1), lambda: tf.constant(0)) 
    hit_op = hit.assign_add(is_inside) 
    with tf.control_dependencies([hit_op]): 
     return ctr + 1 

def condition(ctr): 
    return ctr < n_trials 

with tf.Session() as sess: 
    tf.global_variables_initializer().run() 
    result = tf.while_loop(condition, body, [tf.constant(0)]) 

    start = time.time() 
    sess.run(result) 

    hits = hit.eval() 
    print("Pi is {}".format(4.*hits/n_trials)) 
    print("Tensorflow operation took {:.2f} s".format((time.time()-start)))

來源

2017-04-10 20:28:25 brownyoda

很簡單，session.run有很多開銷，並且它不是以這種方式使用的。通常，例如，一個神經網絡，你可以稱之爲一個會話。運行十幾個大矩陣的乘法運算，然後這0.2毫秒所需要的根本就不重要。至於你的情況，你可能想要類似的東西。它比我的機器上的numpy版本運行速度快5倍。

順便說一句，你在numpy中做同樣的事情。如果你使用loop來減少而不是np.sum，它會慢得多。

import tensorflow as tf 
    import numpy as np 
    import time 

    n_trials = 50000 

    tf.reset_default_graph() 

    x = tf.random_uniform(shape=(n_trials,), name='x') 
    y = tf.random_uniform(shape=(), name='y') 
    r = tf.sqrt(x**2 + y**2) 

    hit = tf.Variable(0, name='hit') 

    # perform the monte carlo step 
    is_inside = tf.cast(tf.less(r, 1), tf.int32) 
    hit2= tf.reduce_sum(is_inside) 
     #hit_op = hit.assign_add(is_inside) 

    with tf.Session() as sess: 
    # init_op = tf.global_variables_initializer() 
     sess.run(tf.initialize_all_variables()) 

     # Make sure no new nodes are added to the graph 
     sess.graph.finalize() 

     start = time.time() 

     # Run monte carlo trials -- This is very slow 
     #for _ in range(n_trials): 
     sess.run(hit2) 

     hits = hit2.eval() 
     print("Pi is {}".format(4*hits/n_trials)) 
     print("Tensorflow operation took {:.2f} s".format((time.time()-start)))

來源

2017-03-18 18:19:37

儘管如此，使用np.sum（）並不是一個公平的比較。我更新了我的帖子，總結了使用for循環;但是，速度似乎沒有受到影響 – Ben

啊，是的，這個numpy的錯了，你應該這樣做：'mypi = np.sum（np.sqrt（np.sum（np.square（np.random。統一（size =（2，n_trials））），axis = 0））<1）* 4/n_trials'，它比你的代碼少100倍的時間。至於問題，我沒有說它應該是。它是關於張量流 - 它是爲某些操作而設計的，不適用於其他操作。 –

感謝您的回覆，但說這是「錯誤的」是不公平的 - 我只是用它作爲基準工具。請您詳細說明'sess.run（）'中的開銷。在這一點上，圖表應該最終確定和編制（缺乏更好的術語），並且操作應該非常快速地執行。 – Ben

評估Tensorflow操作在循環中非常緩慢

回答

相關問題