2016-01-11 173 views
3

在Jupyter筆記本中使用TensorFlow時,我似乎無法恢復已保存的變量。我訓練ANN,然後運行saver.save(sess, "params1.ckpt")然後我再訓練一次,保存新結果saver.save(sess, "params2.ckpt"),但是當我運行saver.restore(sess, "params1.ckpt")時,我的模型不會加載保存在params1.ckpt上的值,並將它們保留在params2.ckpt中。Jupyter上的TensorFlow:無法恢復變量

如果我運行模型,將其保存到params.ckpt,然後關閉並停止,然後嘗試再次加載它,我得到以下錯誤:

--------------------------------------------------------------------------- 
StatusNotOK        Traceback (most recent call last) 
StatusNotOK: Not found: Tensor name "Variable/Adam" not found in checkpoint files params.ckpt 
    [[Node: save/restore_slice_1 = RestoreSlice[dt=DT_FLOAT, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]] 

During handling of the above exception, another exception occurred: 

SystemError        Traceback (most recent call last) 
<ipython-input-6-39ae6b7641bd> in <module>() 
----> 1 saver.restore(sess, "params.ckpt") 

/usr/local/lib/python3.5/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path) 
    889  save_path: Path where parameters were previously saved. 
    890  """ 
--> 891  sess.run([self._restore_op_name], {self._filename_tensor_name: save_path}) 
    892 
    893 

/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict) 
    366 
    367  # Run request and get response. 
--> 368  results = self._do_run(target_list, unique_fetch_targets, feed_dict_string) 
    369 
    370  # User may have fetched the same tensor multiple times, but we 

/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, target_list, fetch_list, feed_dict) 
    426 
    427  return tf_session.TF_Run(self._session, feed_dict, fetch_list, 
--> 428        target_list) 
    429 
    430  except tf_session.StatusNotOK as e: 

SystemError: <built-in function delete_Status> returned a result with an error set 

我對訓練碼:

def weight_variable(shape, name): 
    initial = tf.truncated_normal(shape, stddev=1.0, name=name) 
    return tf.Variable(initial) 

def bias_variable(shape, name): 
    initial = tf.constant(1.0, shape=shape) 
    return tf.Variable(initial, name=name) 

input_file = pd.read_csv('P2R0PC0.csv') 
features = #vector with 5 feature names 
targets = #vector with 4 feature names 
x_data = input_file.as_matrix(features) 
t_data = input_file.as_matrix(targets) 

x = tf.placeholder(tf.float32, [None, x_data.shape[1]]) 

hiddenDim = 5 

b1 = bias_variable([hiddenDim], name = "b1") 
W1 = weight_variable([x_data.shape[1], hiddenDim], name = "W1") 

b2 = bias_variable([t_data.shape[1]], name = "b2") 
W2 = weight_variable([hiddenDim, t_data.shape[1]], name = "W2") 

hidden = tf.nn.sigmoid(tf.matmul(x, W1) + b1) 
y = tf.nn.sigmoid(tf.matmul(hidden, W2) + b2) 
t = tf.placeholder(tf.float32, [None, t_data.shape[1]]) 

lambda1 = 1 
beta1 = 1 
lambda2 = 1 
beta2 = 1 
error = -tf.reduce_sum(t * tf.log(tf.clip_by_value(y,1e-10,1.0)) + (1 - t) * tf.log(tf.clip_by_value(1 - y,1e-10,1.0))) 
complexity = lambda1 * tf.nn.l2_loss(W1) + beta1 * tf.nn.l2_loss(b1) + lambda2 * tf.nn.l2_loss(W2) + beta2 * tf.nn.l2_loss(b2) 
loss = error + complexity 

train_step = tf.train.AdamOptimizer(0.001).minimize(loss) 
sess = tf.Session() 

init = tf.initialize_all_variables() 
sess.run(init) 

ran = 25001 
delta = 250 

plot_data = np.zeros(int(ran/delta + 1)) 
k = 0; 
for i in range(ran): 
    train_step.run({x: data, t: labels}, sess) 
    if i % delta == 0: 
     plot_data[k] = loss.eval({x: data, t: labels}, sess) 
     #plot_training[k] = loss.eval({x: x_test, t: t_test}, sess) 
     print(str(plot_data[k])) 
     k = k + 1 

plt.plot(np.arange(start=2, stop=int(ran/delta + 1)), plot_data[2:]) 

saver = tf.train.Saver() 
saver.save(sess, "params.ckpt") 

error.eval({x:data, t: labels}, session=sess) 

我做錯了什麼?爲什麼我不能恢復我的變量?

+0

你建立在相同的圖形的多個副本同樣的過程?如果是這樣,有可能不同檢查點中張量的名稱是不同的,當您嘗試恢復它們時會導致不匹配。 – mrry

+0

你是什麼意思構建多個副本?我的意思是,每當我上牀睡覺然後返回到我的電腦時,我都必須重新運行整個代碼,所以我必須重新生成圖表,但名稱應該正常工作?除了我剛剛意識到我寫了名稱=名稱在一個不同的地方比我打算在weight_variable函數,所以我會看到,如果這是問題... –

+0

我的意思是你在同一過程中多次執行代碼訓練(即從一些外部功能未顯示)? – mrry

回答

6

它看起來像你使用Jupyter來建立你的模型。使用默認參數構造tf.Saver時,一個可能的問題是它將使用變量的(自動生成的)名稱作爲檢查點中的鍵。由於在Jupyter中很容易多次重新執行代碼單元,因此可能會在會話中保存多個變量節點副本。有關可能出錯的解釋,請參閱my answer to this question

有幾種可能的解決方案。下面是最簡單的:

  • 呼叫tf.reset_default_graph()你建立你的模型(和Saver)前。這將確保變量獲得您想要的名稱,但它會使先前創建的圖無效。

  • tf.train.Saver()使用顯式參數指定變量的持久名稱。對於示例這不應該是太硬(雖然它變得不實用了較大的模型):

    saver = tf.train.Saver(var_list={"b1": b1, "W1": W1, "b2": b2, "W2": W2}) 
    
  • 創建一個新的tf.Graph(),使每次創建模型時它的默認值。這可以在Jupyter會非常棘手,因爲它迫使你把所有的模型建築規範在一個小區內,但它可以很好地用於腳本:

    with tf.Graph().as_default(): 
        # Model building and training/evaluation code goes here. 
    
+0

似乎沒有任何函數稱爲「reset_default_graph()」,我寧願不使用明確的參數,因爲你解釋的原因。 –

+0

第三個選項是否適用於您,如果不是,您可以嘗試第二個選項以查看它是否修復了您的問題?看起來你必須[從源代碼安裝](https://www.tensorflow.org/versions/master/get_started/os_setup.html#installing-from-sources)來獲得'tf.reset_default_graph()'。 – mrry

+0

第二個作品!謝謝! –