使用caffe模型進行特徵提取

我想使用叫做GoggleNet的Caffe模型來討論特徵提取。我指的是這篇文章"End to end people detection in crowded scenes"。對於熟悉caffe的人，應該能夠應付我的疑問。使用caffe模型進行特徵提取

該論文有自己的library使用Python，我也跑過庫，但無法應付文中提到的一些觀點。

輸入圖像通過GoogleNet till inception_5b/output圖層。然後輸出形成15x20x1024的多維數組。所以每個1024向量表示64x64區域中心的邊界框。由於它是50％重疊，因此640×480圖像有15×20矩陣，每個單元具有1024維向量的第三維長度。

我的查詢是

（1）可如何獲得這個15x20x1024陣列輸出？

（2）這個1x1x1024數據如何表示圖像中的64x64區域？有一個在源代碼的描述作爲

"""Takes the output from the decapitated googlenet and transforms the output 
    from a NxCxWxH to (NxWxH)xCx1x1 that is used as input for the lstm layers. 
    N = batch size, C = channels, W = grid width, H = grid height."""

即轉換將作爲

def generate_intermediate_layers(net): 
    """Takes the output from the decapitated googlenet and transforms the output 
    from a NxCxWxH to (NxWxH)xCx1x1 that is used as input for the lstm layers. 
    N = batch size, C = channels, W = grid width, H = grid height.""" 

    net.f(Convolution("post_fc7_conv", bottoms=["inception_5b/output"], 
         param_lr_mults=[1., 2.], param_decay_mults=[0., 0.], 
         num_output=1024, kernel_dim=(1, 1), 
         weight_filler=Filler("gaussian", 0.005), 
         bias_filler=Filler("constant", 0.))) 
    net.f(Power("lstm_fc7_conv", scale=0.01, bottoms=["post_fc7_conv"])) 
    net.f(Transpose("lstm_input", bottoms=["lstm_fc7_conv"]))

使用Python中功能實現我不能應付該部分作爲各1x1x1024如何表示邊界的該尺寸框矩形。

來源

2017-02-21 batuman

由於您正在查看1x1網格非常深的網格，因此它很有效recptive field非常大，原始圖像中可能（可能是）64x64像素。
也就是說，"inception_5b/output"中的每個功能都受到輸入圖像中64x64像素的影響。

來源

2017-02-21 08:00:11 Shai

意味着電平輸出可能是15x20x1024。由於接受字段，每個1x1x1024表示圖像中的64x64矩形大小？ – batuman

這個解釋有沒有什麼好的參考？感謝Quora的參考，比這更深入的討論？ – batuman

@batuman目前我能想到的「典型」參考。隨時編輯答案，如果你找到一個好的鏈接:) – Shai

使用caffe模型進行特徵提取

回答

相關問題