2017-04-21 37 views
1

我想檢查一下我是否可以解決this問題而不是pymc3。實驗的想法是我要定義一個包含切換點的probibalistic系統。我可以使用抽樣作爲推論的一種方法,但是我開始想知道爲什麼我不能用梯度下降來代替。tf.where導致優化器在張量流失敗

我決定在張量流中進行梯度搜索,但當涉及tf.where時,似乎張量流很難執行梯度搜索。

您可以在下面找到該代碼。

import tensorflow as tf 
import numpy as np 

x1 = np.random.randn(50)+1 
x2 = np.random.randn(50)*2 + 5 
x_all = np.hstack([x1, x2]) 
len_x = len(x_all) 
time_all = np.arange(1, len_x + 1) 

mu1 = tf.Variable(0, name="mu1", dtype=tf.float32) 
mu2 = tf.Variable(5, name = "mu2", dtype=tf.float32) 
sigma1 = tf.Variable(2, name = "sigma1", dtype=tf.float32) 
sigma2 = tf.Variable(2, name = "sigma2", dtype=tf.float32) 
tau = tf.Variable(10, name = "tau", dtype=tf.float32) 

mu = tf.where(time_all < tau, 
       tf.ones(shape=(len_x,), dtype=tf.float32) * mu1, 
       tf.ones(shape=(len_x,), dtype=tf.float32) * mu2) 
sigma = tf.where(time_all < tau, 
       tf.ones(shape=(len_x,), dtype=tf.float32) * sigma1, 
       tf.ones(shape=(len_x,), dtype=tf.float32) * sigma2) 

likelihood_arr = tf.log(tf.sqrt(1/(2*np.pi*tf.pow(sigma, 2)))) -tf.pow(x_all - mu, 2)/(2*tf.pow(sigma, 2)) 
total_likelihood = tf.reduce_sum(likelihood_arr, name="total_likelihood") 

optimizer = tf.train.RMSPropOptimizer(0.01) 
opt_task = optimizer.minimize(-total_likelihood) 
init = tf.global_variables_initializer() 

with tf.Session() as sess: 
    sess.run(init) 
    print("these variables should be trainable: {}".format([_.name for _ in tf.trainable_variables()])) 
    for step in range(10000): 
     _lik, _ = sess.run([total_likelihood, opt_task]) 
     if step % 1000 == 0: 
      variables = {_.name:_.eval() for _ in [mu1, mu2, sigma1, sigma2, tau]} 
      print("step: {}, values: {}".format(str(step).zfill(4), variables)) 

你會注意到tau參數不會改變,即使tensorflow似乎知道變量和它的梯度。任何線索發生了什麼問題?這是可以用張量計算的東西嗎?還是我需要不同的模式?

回答

3

tau僅用於參數wherecondition參數:(tf.where(time_all < tau, ...),它是一個布爾張量。由於計算梯度只對連續值有意義,因此輸出相對於tau的梯度將爲零。

即使忽略tf.where,在表達式中time_all < tau,這是不變的,幾乎無處不用tau,所以具有零梯度。

由於零的梯度,沒有辦法學習tau與梯度下降方法。

根據您的問題,也許不是的兩個值之間的硬開關,你可以用一個加權總和,而不是p*val1 + (1-p)*val2,其中p以連續的方式取決於tau

0

指定的解決方案是正確的答案,但不包含解決我的問題的代碼。下面的代碼片段確實如此。

import tensorflow as tf 
import numpy as np 
import os 
import uuid 

TENSORBOARD_PATH = "/tmp/tensorboard-switchpoint" 
# tensorboard --logdir=/tmp/tensorboard-switchpoint 

x1 = np.random.randn(35)-1 
x2 = np.random.randn(35)*2 + 5 
x_all = np.hstack([x1, x2]) 
len_x = len(x_all) 
time_all = np.arange(1, len_x + 1) 

mu1 = tf.Variable(0, name="mu1", dtype=tf.float32) 
mu2 = tf.Variable(0, name = "mu2", dtype=tf.float32) 
sigma1 = tf.Variable(2, name = "sigma1", dtype=tf.float32) 
sigma2 = tf.Variable(2, name = "sigma2", dtype=tf.float32) 
tau = tf.Variable(15, name = "tau", dtype=tf.float32) 
switch = 1./(1+tf.exp(tf.pow(time_all - tau, 1))) 

mu = switch*mu1 + (1-switch)*mu2 
sigma = switch*sigma1 + (1-switch)*sigma2 

likelihood_arr = tf.log(tf.sqrt(1/(2*np.pi*tf.pow(sigma, 2)))) - tf.pow(x_all - mu, 2)/(2*tf.pow(sigma, 2)) 
total_likelihood = tf.reduce_sum(likelihood_arr, name="total_likelihood") 

optimizer = tf.train.AdamOptimizer() 
opt_task = optimizer.minimize(-total_likelihood) 
init = tf.global_variables_initializer() 

tf.summary.scalar("mu1", mu1) 
tf.summary.scalar("mu2", mu2) 
tf.summary.scalar("sigma1", sigma1) 
tf.summary.scalar("sigma2", sigma2) 
tf.summary.scalar("tau", tau) 
tf.summary.scalar("likelihood", total_likelihood) 
merged_summary_op = tf.summary.merge_all() 

with tf.Session() as sess: 
    sess.run(init) 
    print("these variables should be trainable: {}".format([_.name for _ in tf.trainable_variables()])) 
    uniq_id = os.path.join(TENSORBOARD_PATH, "switchpoint-" + uuid.uuid1().__str__()[:4]) 
    summary_writer = tf.summary.FileWriter(uniq_id, graph=tf.get_default_graph()) 
    for step in range(40000): 
     lik, opt, summary = sess.run([total_likelihood, opt_task, merged_summary_op]) 
     if step % 100 == 0: 
      variables = {_.name:_.eval() for _ in [total_likelihood]} 
      summary_writer.add_summary(summary, step) 
      print("i{}: {}".format(str(step).zfill(5), variables))