TensorFlow：指定並行運行的Ops數

據我所知TF只要並行調用多個運算符並行。（link）TensorFlow：指定並行運行的Ops數

而並行可通過inter_op_parallelism_threads和intra_op_parallelism_threads如果在CPU（link）運行運營商ARë來控制。但是，這些參數完全不會影響GPU運算符。我該如何控制GPU的並行性？（例如，運行操作人員連續雖然有獨立的運營商）

編輯：

a=tf.random_normal([N,N]) 
b=tf.random_normal([N,N]) 
c=tf.random_normal([N,N]) 
d=tf.random_normal([N,N]) 

x=tf.matmul(a,b) 
y=tf.matmul(c,d) 
z=tf.matmul(x,y)

來源

2017-01-11 enc

GPU僅運行在時間s一個計算運算 –

http://stackoverflow.com/questions/39481453/tensorflow-device-contexts-streams-and-context-switching –

@YaroslavBulatov然後sess.run（z）的應採取比sess.run 3倍（x），對嗎？但是，在我的實驗中，它只需要2倍。 – enc

這裏有一個方法來分析執行，避免常見的陷阱：

# Turn off graph-rewriting optimizations 
config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0))) 

# throw error if explicit device placement can't be satisfied 
config.allow_soft_placement = False 

N = 8192 
with tf.device("/gpu:0"): 
    input1 = tf.Variable(tf.random_normal([N,N])) 
    input2 = tf.Variable(tf.random_normal([N,N])) 
    result = tf.matmul(input1, input2) 
    result_no_output = result.op # to avoid transferring data back to Python 
sess = tf.Session(config=config) 

# load values into GPU 
sess.run(tf.global_variables_initializer()) 

# pre-warming 
sess.run(result_no_output) 

num_ops = N**3 + N**2*(N-1) # N^3 muls, N^2 (N-1) adds 
elapsed = [] 
for i in range(10): 
    start = time.time() 

    sess.run(result_no_output) 
    elapsed.append(time.time()-start) 

print("%d x %d matmul, %.2f elapsed, %.2f G ops/sec"%(N, N, min(elapsed), num_ops/min(elapsed)/10**9))

在TitanX帕斯卡這表明9.5 T ops/sec接近理論最大值11 T ops/sec理論最大值

8192 x 8192 matmul, 0.12 elapsed, 9527.10 G ops/sec

來源

2017-01-11 15:59:04

TensorFlow：指定並行運行的Ops數

回答

相關問題