我想在Theano中爲CNN網絡實現自定義卷積層,並且爲了這樣做,我使用了掃描功能。這個想法是將新的卷積掩模應用於每個像素。來自Theano的掃描功能複製non_sequences共享變量
scan
函數編譯正確,但出於某種原因,我收到了內存不足的錯誤。調試(見下文)表示non_sequences
變量複製爲循環的每個實例(每個像素),這當然殺死了我的GPU內存:
def convolve_location(index, input, bias):
hsize = self.W.shape/2
t = T.switch(index[0]-hsize[0] < 0, 0, index[0]-hsize[0])
l = T.switch(index[1]-hsize[1] < 0, 0, index[1]-hsize[1])
b = T.switch(index[0]+hsize[0] >= input.shape[2], input.shape[2]-1, index[0]+hsize[0])
r = T.switch(index[1]+hsize[1] >= input.shape[3], input.shape[3]-1, index[1]+hsize[1])
r_image = (input[:, :, t:b, l:r] - input[:, :, index[0], index[1]][:, :, None, None]) ** 2
r_delta = (bias[:, :, t:b, l:r] - bias[:, :, index[0], index[1]][:, :, None, None]) ** 2
return T.sum(r_image*r_delta)
# # Define cost function over all pixels
self.inds = theano.shared(np.array([(i, j) for i in range(self.image_shape[2]) for j in range(self.image_shape[3])], dtype='int32'), borrow=True)
self.cost = T.sum(theano.scan(
fn=convolve_location,
outputs_info=None,
sequences=[self.inds],
non_sequences=[self.input, self.b],
n_steps=np.prod(self.image_shape[-2:])
)[0])
下面是從調試器輸出:
MemoryError: alloc failed Apply node that caused the error: Alloc(TensorConstant{0.0}, TensorConstant{1025}, TensorConstant{2000}, TensorConstant{3}, TensorConstant{32}, TensorConstant{32}) Inputs types: [TensorType(float32, scalar), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar)] Inputs shapes: [(),(),(),(),(),()] Inputs strides: [(),(),(),(),(),()] Inputs values: [array(0.0, dtype=float32), array(1025), array(2000), array(3), array(32), array(32)]
Debugprint of the apply node: Alloc [@A] <TensorType(float32, 5D)> '' |TensorConstant{0.0} [@B] <TensorType(float32, scalar)> |TensorConstant{1025} [@C] <TensorType(int64, scalar)> |TensorConstant{2000} [@D] <TensorType(int64, scalar)> |TensorConstant{3} [@E] <TensorType(int64, scalar)> |TensorConstant{32} [@F] <TensorType(int64, scalar)> |TensorConstant{32} [@F] <TensorType(int64, scalar)> Storage map footprint:
- CudaNdarrayConstant{[[[[ 0.]]]]}, Shape: (1, 1, 1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
- Constant{18}, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{(1, 1) of 0}, Shape: (1, 1), ElemSize: 1 Byte(s), TotalSize: 1 Byte(s)
- Constant{1024}, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- Constant{-1}, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{32}, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- Subtensor{:int64:}.0, Shape: (1024,), ElemSize: 4 Byte(s), TotalSize: 4096 Byte(s)
- Constant{34}, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- Constant{2}, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{[2000 3.. 32 32]}, Shape: (4,), ElemSize: 8 Byte(s), TotalSize: 32 Byte(s)
- Reshape{4}.0, Shape: (2000, 3, 32, 32), ElemSize: 4 Byte(s), TotalSize: 24576000 Byte(s)
- TensorConstant{(1, 1, 1, 1) of 0}, Shape: (1, 1, 1, 1), ElemSize: 1 Byte(s), TotalSize: 1 Byte(s)
- CudaNdarrayConstant{[[[[ 0.1]]]]}, Shape: (1, 1, 1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
- <TensorType(float32, matrix)>, Shape: (50000, 3072), ElemSize: 4 Byte(s), TotalSize: 614400000 Byte(s)
,你可以+ 1
爲什麼是non_sequences
變量看到顯示爲1025x2000x3x32x32張量,而原來的張量大小2000x3x32x32,而1025是掃描的迭代次數輸入爲每個迭代複製而不是簡單地被重用,我該如何解決它?
編輯:
兩個self.input
self.b
和共享變量。 Self.input被傳遞到初始化當類,而self.b在類內定義如下:
self.b = theano.shared(np.zeros(image_shape, dtype=theano.config.floatX), borrow=True)
您沒有顯示如何定義'self.input'和'self.b'。它們是共享變量嗎?另外它可以幫助調試給你的Theano變量名稱。 – cfh
感謝cfh,我編輯了這篇文章。這兩個變量確實是共享的。命名它們雖然會有點混亂,因爲網絡中的每一層都會生成它們自己的這些變量版本。 – gaspercat