1

我想設計一個卷積網來估計使用Keras的圖像的深度。使用Keras的深度估計

我有RGB輸入圖像的形狀爲3x120x160,並具有1x120x160形狀的灰度輸出深度圖。

我嘗試使用VGG之類的架構,其中每個圖層的深度都在增長,但最後我想設計最終圖層的時候卡住了。使用密集層太昂貴,我嘗試使用Upsampling,這證明效率低下。

我想使用DeConvolution2D,但我無法得到它的工作。我結束了唯一的結構是這樣的:

model = Sequential() 
    model.add(Convolution2D(64, 5, 5, activation='relu', input_shape=(3, 120, 160))) 
    model.add(Convolution2D(64, 5, 5, activation='relu')) 
    model.add(MaxPooling2D()) 
    model.add(Dropout(0.5)) 

    model.add(Convolution2D(128, 3, 3, activation='relu')) 
    model.add(Convolution2D(128, 3, 3, activation='relu')) 
    model.add(MaxPooling2D()) 
    model.add(Dropout(0.5)) 

    model.add(Convolution2D(256, 3, 3, activation='relu')) 
    model.add(Convolution2D(256, 3, 3, activation='relu')) 
    model.add(Dropout(0.5)) 

    model.add(Convolution2D(512, 3, 3, activation='relu')) 
    model.add(Convolution2D(512, 3, 3, activation='relu')) 
    model.add(Dropout(0.5)) 

    model.add(ZeroPadding2D()) 
    model.add(Deconvolution2D(512, 3, 3, (None, 512, 41, 61), subsample=(2, 2), activation='relu')) 
    model.add(Deconvolution2D(512, 3, 3, (None, 512, 123, 183), subsample=(3, 3), activation='relu')) 
    model.add(cropping.Cropping2D(cropping=((1, 2), (11, 12)))) 
    model.add(Convolution2D(1, 1, 1, activation='sigmoid', border_mode='same')) 

模型摘要是這樣的:

Layer (type)      Output Shape   Param #  Connected to      
==================================================================================================== 
convolution2d_1 (Convolution2D) (None, 64, 116, 156) 4864  convolution2d_input_1[0][0]  
____________________________________________________________________________________________________ 
convolution2d_2 (Convolution2D) (None, 64, 112, 152) 102464  convolution2d_1[0][0]    
____________________________________________________________________________________________________ 
maxpooling2d_1 (MaxPooling2D) (None, 64, 56, 76) 0   convolution2d_2[0][0]    
____________________________________________________________________________________________________ 
dropout_1 (Dropout)    (None, 64, 56, 76) 0   maxpooling2d_1[0][0]    
____________________________________________________________________________________________________ 
convolution2d_3 (Convolution2D) (None, 128, 54, 74) 73856  dropout_1[0][0]     
____________________________________________________________________________________________________ 
convolution2d_4 (Convolution2D) (None, 128, 52, 72) 147584  convolution2d_3[0][0]    
____________________________________________________________________________________________________ 
maxpooling2d_2 (MaxPooling2D) (None, 128, 26, 36) 0   convolution2d_4[0][0]    
____________________________________________________________________________________________________ 
dropout_2 (Dropout)    (None, 128, 26, 36) 0   maxpooling2d_2[0][0]    
____________________________________________________________________________________________________ 
convolution2d_5 (Convolution2D) (None, 256, 24, 34) 295168  dropout_2[0][0]     
____________________________________________________________________________________________________ 
convolution2d_6 (Convolution2D) (None, 256, 22, 32) 590080  convolution2d_5[0][0]    
____________________________________________________________________________________________________ 
dropout_3 (Dropout)    (None, 256, 22, 32) 0   convolution2d_6[0][0]    
____________________________________________________________________________________________________ 
convolution2d_7 (Convolution2D) (None, 512, 20, 30) 1180160  dropout_3[0][0]     
____________________________________________________________________________________________________ 
convolution2d_8 (Convolution2D) (None, 512, 18, 28) 2359808  convolution2d_7[0][0]    
____________________________________________________________________________________________________ 
dropout_4 (Dropout)    (None, 512, 18, 28) 0   convolution2d_8[0][0]    
____________________________________________________________________________________________________ 
zeropadding2d_1 (ZeroPadding2D) (None, 512, 20, 30) 0   dropout_4[0][0]     
____________________________________________________________________________________________________ 
deconvolution2d_1 (Deconvolution2(None, 512, 41, 61) 2359808  zeropadding2d_1[0][0]    
____________________________________________________________________________________________________ 
deconvolution2d_2 (Deconvolution2(None, 512, 123, 183) 2359808  deconvolution2d_1[0][0]   
____________________________________________________________________________________________________ 
cropping2d_1 (Cropping2D)  (None, 512, 120, 160) 0   deconvolution2d_2[0][0]   
____________________________________________________________________________________________________ 
convolution2d_9 (Convolution2D) (None, 1, 120, 160) 513   cropping2d_1[0][0]    
==================================================================================================== 
Total params: 9474113 

我無法從512減少Deconvolution2D層的大小,因爲這樣做在形狀,因此結果相關的錯誤,似乎我必須添加與前一層中的濾鏡數量一樣多的Deconvolution2D圖層。 我還必須添加一個最終的Convolution2D圖層才能運行網絡。

上述架構學習速度非常慢,(我認爲)效率低下。我確信我做錯了事,設計不應該像這樣。你能幫我設計一個更好的網絡嗎?

我也嘗試使網絡成爲this repository中提到的網絡,但似乎Keras不能像這種千層麪示例那樣工作。我真的很感激,如果有人能告訴我如何在Keras設計這樣的網絡。它的結構是這樣的:

enter image description here

感謝

回答

1

我建議一個U-Net(見圖1)。在U-Net的前半部分,隨着通道數量的增加,空間分辨率會降低(就像您提到的VGG一樣)。在下半年,恰好相反,(通道數量減少,分辨率增加)。不同層之間的「跳過」連接允許網絡有效地產生高分辨率輸出。

您應該能夠找到適當的Keras實現(可能是this one)。