我試圖使用張量流實現跳過思考模型,並且當前版本被放置here。 瞭解ResourceExhaustedError:分配形狀張量時的OOM
目前我用我的機器上的一個GPU(共2個GPU)和GPU的信息是
2017-09-06 11:29:32.657299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.683
pciBusID 0000:02:00.0
Total memory: 10.91GiB
Free memory: 10.75GiB
然而,我當我試圖把數據反饋給模型OOM。我嘗試調試如下:
我用下面的代碼片段我運行sess.run(tf.global_variables_initializer())
logger.info('Total: {} params'.format(
np.sum([
np.prod(v.get_shape().as_list())
for v in tf.trainable_variables()
])))
,並得到2017-09-06 11:29:51,333 INFO main main.py:127 - Total: 62968629 params
之後,如果全部採用tf.float32
約莫240Mb
。的tf.global_variables
輸出是
[<tf.Variable 'embedding/embedding_matrix:0' shape=(155229, 200) dtype=float32_ref>,
<tf.Variable 'encoder/rnn/gru_cell/gates/kernel:0' shape=(400, 400) dtype=float32_ref>,
<tf.Variable 'encoder/rnn/gru_cell/gates/bias:0' shape=(400,) dtype=float32_ref>,
<tf.Variable 'encoder/rnn/gru_cell/candidate/kernel:0' shape=(400, 200) dtype=float32_ref>,
<tf.Variable 'encoder/rnn/gru_cell/candidate/bias:0' shape=(200,) dtype=float32_ref>,
<tf.Variable 'decoder/weights:0' shape=(200, 155229) dtype=float32_ref>,
<tf.Variable 'decoder/biases:0' shape=(155229,) dtype=float32_ref>,
<tf.Variable 'decoder/previous_decoder/rnn/gru_cell/gates/kernel:0' shape=(400, 400) dtype=float32_ref>,
<tf.Variable 'decoder/previous_decoder/rnn/gru_cell/gates/bias:0' shape=(400,) dtype=float32_ref>,
<tf.Variable 'decoder/previous_decoder/rnn/gru_cell/candidate/kernel:0' shape=(400, 200) dtype=float32_ref>,
<tf.Variable 'decoder/previous_decoder/rnn/gru_cell/candidate/bias:0' shape=(200,) dtype=float32_ref>,
<tf.Variable 'decoder/next_decoder/rnn/gru_cell/gates/kernel:0' shape=(400, 400) dtype=float32_ref>,
<tf.Variable 'decoder/next_decoder/rnn/gru_cell/gates/bias:0' shape=(400,) dtype=float32_ref>,
<tf.Variable 'decoder/next_decoder/rnn/gru_cell/candidate/kernel:0' shape=(400, 200) dtype=float32_ref>,
<tf.Variable 'decoder/next_decoder/rnn/gru_cell/candidate/bias:0' shape=(200,) dtype=float32_ref>,
<tf.Variable 'global_step:0' shape=() dtype=int32_ref>]
在我的訓練句話,我有一個數據數組,其形狀爲(164652, 3, 30)
,即sample_size x 3 x time_step
的3
在這裏是指前面的句子,當前句子和下一個句子。該訓練數據的大小約爲57Mb
,存儲在loader
中。然後,我用寫一個生成函數來獲得句子,看起來像
def iter_batches(self, batch_size=128, time_major=True, shuffle=True):
num_samples = len(self._sentences)
if shuffle:
samples = self._sentences[np.random.permutation(num_samples)]
else:
samples = self._sentences
batch_start = 0
while batch_start < num_samples:
batch = samples[batch_start:batch_start + batch_size]
lens = (batch != self._vocab[self._vocab.pad_token]).sum(axis=2)
y, x, z = batch[:, 0, :], batch[:, 1, :], batch[:, 2, :]
if time_major:
yield (y.T, lens[:, 0]), (x.T, lens[:, 1]), (z.T, lens[:, 2])
else:
yield (y, lens[:, 0]), (x, lens[:, 1]), (z, lens[:, 2])
batch_start += batch_size
的訓練循環看起來像
for epoch in num_epochs:
batches = loader.iter_batches(batch_size=args.batch_size)
try:
(y, y_lens), (x, x_lens), (z, z_lens) = next(batches)
_, summaries, loss_val = sess.run(
[train_op, train_summary_op, st.loss],
feed_dict={
st.inputs: x,
st.sequence_length: x_lens,
st.previous_targets: y,
st.previous_target_lengths: y_lens,
st.next_targets: z,
st.next_target_lengths: z_lens
})
except StopIteraton:
...
然後我得到了一個OOM。如果我將整個try
正文(不提供數據)註釋掉,腳本運行得很好。
我不知道爲什麼我在這麼小的數據範圍內得到了OOM。使用nvidia-smi
我總是得到
Wed Sep 6 12:03:37 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.59 Driver Version: 384.59 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:02:00.0 Off | N/A |
| 0% 44C P2 60W/275W | 10623MiB/11172MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 0% 43C P2 62W/275W | 10621MiB/11171MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 32748 C python3 10613MiB |
| 1 32748 C python3 10611MiB |
+-----------------------------------------------------------------------------+
我無法看到我的腳本的實際 GPU使用,因爲總是tensorflow搶斷開頭的所有記憶。這裏的實際問題是我不知道如何調試。
我讀過一些關於StackOverflow上的OOM的文章。其中大部分發生在向模型提供大型測試集數據並通過小批量提供數據時可以避免該問題。但我不明白爲什麼在我的11Gb 1080Ti中看到這樣的小數據和參數組合糟糕,因爲它只是試圖分配一個大小爲[3840 x 155229]
的矩陣。 (解碼器的輸出矩陣,3840 = 30(time_steps) x 128(batch_size)
,155229
是vocab_size)。
2017-09-06 12:14:45.787566: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ********************************************************************************************xxxxxxxx
2017-09-06 12:14:45.787597: W tensorflow/core/framework/op_kernel.cc:1158] Resource exhausted: OOM when allocating tensor with shape[3840,155229]
2017-09-06 12:14:45.788735: W tensorflow/core/framework/op_kernel.cc:1158] Resource exhausted: OOM when allocating tensor with shape[3840,155229]
[[Node: decoder/previous_decoder/Add = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](decoder/previous_decoder/MatMul, decoder/biases/read)]]
2017-09-06 12:14:45.790453: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2857 get requests, put_count=2078 evicted_count=1000 eviction_rate=0.481232 and unsatisfied allocation rate=0.657683
2017-09-06 12:14:45.790482: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1139, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
status, run_metadata)
File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
next(self.gen)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[3840,155229]
[[Node: decoder/previous_decoder/Add = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](decoder/previous_decoder/MatMul, decoder/biases/read)]]
[[Node: GradientDescent/update/_146 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_2166_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
During handling of the above exception, another exception occurred:
任何幫助將不勝感激。提前致謝。
謝謝!我會盡快嘗試'gpu_options'。關於網絡大小,不是在tf.trainable_variables()])中獲得整數[62968629]的片段'np.sum([np.prod(v.get_shape()。as_list())'網絡的參數?加上梯度,總共爲2 * 62968629 * 4/1024/1024/1024 - > 0.47G'。而且,我的編碼器只有'1'層,我的''2'解碼器只有'1'層。 '3840 x 155229'是解碼器輸出,不是關於參數,所以我認爲它在傳播時不會翻倍? – Edityouprofile
該計算對於推斷是正確的。我以爲你做了一個完全連接的層,我的壞。但是,對於培訓,您需要使用tf.global_variables()而不是trainable_variables()作爲優化器,並且您實現的所有其他附錄將添加更多不可見參數。 –
再次感謝。我打印了'tf.global_variables()'和'tf.trainable_variables()'的結果並更新了問題。在我的情況下,後者只比前者缺少'global_step'張量。 – Edityouprofile