tf.nn.conv2d在tensorflow中做什麼？

我正在查看關於tf.nn.conv2dhere的tensorflow文檔。但我無法理解它做了什麼或試圖達到什麼目的。它說的文檔，tf.nn.conv2d在tensorflow中做什麼？

＃1：展平了過濾器的2-d矩陣與形狀

[filter_height * filter_width * in_channels, output_channels]。

現在做了什麼？那是單元乘法還是純矩陣乘法？我也無法理解文檔中提到的其他兩點。我已經寫在下面這些：

＃2：提取來自所述輸入圖像張量補丁以形成形狀

[batch, out_height, out_width, filter_height * filter_width * in_channels]的虛擬張量。

＃3：對於每個修補程序，右乘過濾器矩陣和圖像修補程序向量。

如果有人可以舉一個例子，一段代碼（非常有幫助），並解釋那裏發生了什麼，以及爲什麼操作是這樣，這將是非常有幫助的。

我試過編碼一小部分，並打印出操作的形狀。不過，我不明白。

我想是這樣的：

op = tf.shape(tf.nn.conv2d(tf.random_normal([1,10,10,10]), 
       tf.random_normal([2,10,10,10]), 
       strides=[1, 2, 2, 1], padding='SAME')) 

with tf.Session() as sess: 
    result = sess.run(op) 
    print(result)

我瞭解位和卷積神經網絡的碎片。我研究了它們here。但是張量流的實現並不是我所期望的。所以它提出了這個問題。

編輯：所以，我實現了一個更簡單的代碼。但我無法弄清楚發生了什麼事。我的意思是結果如何。如果有人能告訴我什麼過程產生這個輸出，那將是非常有幫助的。

input = tf.Variable(tf.random_normal([1,2,2,1])) 
filter = tf.Variable(tf.random_normal([1,1,1,1])) 

op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME') 
init = tf.initialize_all_variables() 
with tf.Session() as sess: 
    sess.run(init) 

    print("input") 
    print(input.eval()) 
    print("filter") 
    print(filter.eval()) 
    print("result") 
    result = sess.run(op) 
    print(result)

輸出

input 
[[[[ 1.60314465] 
    [-0.55022103]] 

    [[ 0.00595062] 
    [-0.69889867]]]] 
filter 
[[[[-0.59594476]]]] 
result 
[[[[-0.95538563] 
    [ 0.32790133]] 

    [[-0.00354624] 
    [ 0.41650501]]]]

來源

2016-01-05 S_kar

實際上cudnn在'tf.nn.conv2d（）'中是GPU默認啓用的，所以當我們使用支持GPU的TF時，根本不會使用所討論的方法，除非明確指定'use_cudnn_on_gpu = False'。 – gkcn

139

好，我覺得這是要解釋這一切的最簡單方法。

您的示例是1圖像，大小爲2x2，帶有1個通道。您有1個過濾器，大小爲1x1，1個通道（大小爲高x寬x通道x過濾器數量）。

對於這種簡單情況，生成的2x2,1通道圖像（尺寸爲1x2x2x1，圖像數量x高x寬x x通道）是濾鏡值與圖像每個像素的乘積。

現在讓我們嘗試更多的渠道：

input = tf.Variable(tf.random_normal([1,3,3,5])) 
filter = tf.Variable(tf.random_normal([1,1,5,1])) 

op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')

這裏的3×3的圖像和1x1的過濾器各有5個頻道。得到的圖像將是3x3，具有1個通道（大小爲1x3x3x1），其中每個像素的值是濾波器通道與輸入圖像中相應像素的點積。

現在用一個3×3濾波器

input = tf.Variable(tf.random_normal([1,3,3,5])) 
filter = tf.Variable(tf.random_normal([3,3,5,1])) 

op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')

在這裏，我們得到一個1x1圖像，以1個信道（大小1x1x1x1）。該值是9元素和5元素點積的和。但是你可以稱它爲一個45元素的點積。

一個更大的圖像

input = tf.Variable(tf.random_normal([1,5,5,5])) 
filter = tf.Variable(tf.random_normal([3,3,5,1])) 

op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')

輸出

現在是3×3 1通道圖像（大小1x3x3x1）。這些值中的每一個都是9個5元素點積的和。

每個輸出都是通過將濾波器集中在輸入圖像的9箇中心像素中的一個上進行的，因此濾波器不會伸出。下面的x s表示每個輸出像素的濾波器中心。

..... 
.xxx. 
.xxx. 
.xxx. 
.....

現在用「相同的」填充：

input = tf.Variable(tf.random_normal([1,5,5,5])) 
filter = tf.Variable(tf.random_normal([3,3,5,1])) 

op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')

這給出了一個5×5的輸出圖像（大小1x5x5x1）。這是通過將過濾器置於圖像上的每個位置來完成的。

濾鏡伸出圖像邊緣的任何5元素點積都會得到零值。

所以角落只有4,5元點產品的總和。

現在有多個過濾器。

input = tf.Variable(tf.random_normal([1,5,5,5])) 
filter = tf.Variable(tf.random_normal([3,3,5,7])) 

op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')

這仍然給出一個5x5輸出圖像，但有7個通道（大小爲1x5x5x7）。每個通道由集合中的其中一個過濾器生成。

與步幅2,2-

目前：

input = tf.Variable(tf.random_normal([1,5,5,5])) 
filter = tf.Variable(tf.random_normal([3,3,5,7])) 

op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='SAME')

現在的結果仍然有7個信道，但是隻有3×3（大小1x3x3x7）。

這是因爲不是在圖像上的每個點上居中放置過濾器，而是使用寬度爲2的步幅（步幅）將過濾器居中放置在圖像上的每個其他點。下面的x表示過濾器中心輸入圖像上的每個輸出像素。

x.x.x 
..... 
x.x.x 
..... 
x.x.x

當然的輸入的第一尺寸是圖像的數量，以便應用它在批次10幅的圖像的，例如：

input = tf.Variable(tf.random_normal([10,5,5,5])) 
filter = tf.Variable(tf.random_normal([3,3,5,7])) 

op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='SAME')

此執行相同的操作，對於每個圖像獨立地給出一堆10張圖像作爲結果（尺寸10x3x3x7）

來源

2016-01-09 19:45:15 mdaoust

@ZijunLost不，文檔聲明第一個和最後一個元素必須是1.'必須有步幅[0] =步幅[3] = 1.對於相同水平和頂點步幅的最常見情況，strides = [1 ，stride，stride，1]。' – JohnAllen

這是[Toeplitz矩陣]（https://en.wikipedia.org/wiki/Toeplitz_matrix#Discrete_convolution）的卷積實現？ – gkcn

關於這個問題：「這還是給出了一個5x5的輸出圖像，但是有7個通道（大小爲1x5x5x7），每個通道由集合中的一個濾波器產生。」，我仍然難以理解7個通道的來源？你是什麼意思「集合中的過濾器」？謝謝。 – derek

2D卷積的計算方法與計算1D convolution的方式相似：您滑動你的核心在輸入上，計算元素方面的乘法並將它們相加。但是不是你的內核/輸入是一個數組，這裏它們是矩陣。

在最基本的例子中沒有填充和stride = 1。讓我們假設你的input和kernel是：

當你的內核，你會收到以下輸出：，這是通過以下方式計算：

14 = 4 * 1 + 3 * 0 + 1 * 1 + 2 * 2 + 1 * 1 + 0 * 0 + 1 * 0 + 2 * 0 + 4 * 1
6 = 3 * 1 + 1 * 0 + 0 * 1 + 1 * 2 + 0 * 1 + 1 * 0 + 2 * 0 + 4 * 0 + 1 * 1
6 = 2 * 1 + 1 * 0 + 0 * 1 + 1 * 2 + 2 * 1 + 4 * 0 + 3 * 0 + 1 * 0 + 0 * 1
12 = 1 * 1 + 0 * 0 + 1 * 1 + 2 * 2 + 4 * 1 + 1 * 0 + 1 * 0 + 0 * 0 + 2 * 1

TF的conv2d函數計算卷積批量並使用稍微不同的格式。對於內核來說，它是[batch, in_height, in_width, in_channels]，它是[filter_height, filter_width, in_channels, out_channels]。因此，我們需要以正確的格式提供數據：

import tensorflow as tf 
k = tf.constant([ 
    [1, 0, 1], 
    [2, 1, 0], 
    [0, 0, 1] 
], dtype=tf.float32, name='k') 
i = tf.constant([ 
    [4, 3, 1, 0], 
    [2, 1, 0, 1], 
    [1, 2, 4, 1], 
    [3, 1, 0, 2] 
], dtype=tf.float32, name='i') 
kernel = tf.reshape(k, [3, 3, 1, 1], name='kernel') 
image = tf.reshape(i, [1, 4, 4, 1], name='image')

之後卷積計算有：

res = tf.squeeze(tf.nn.conv2d(image, kernel, [1, 1, 1, 1], "VALID")) 
# VALID means no padding 
with tf.Session() as sess: 
    print sess.run(res)

而且將等同於我們手工計算的一個。

對於examples with padding/strides, take a look here。

來源

2017-05-22 00:59:24

不錯的例子，但是有些鏈接被破壞。 – silgon

@silgon可悲的是，這是因爲它決定不支持他們首先創建和宣傳的文檔功能。 –

只需添加到其他的答案，你應該在

filter = tf.Variable(tf.random_normal([3,3,5,7]))

爲對應於每個過濾器的通道數「5」想到的參數。每個濾鏡都是一個3d立方體，深度爲5.您的濾鏡深度必須與您輸入圖像的深度相對應。最後一個參數7應該被認爲是批次中的過濾器數量。忘記這是4D，而是想象你有一套或一批7個過濾器。你所做的是創建7個尺寸爲（3,3,5）的濾鏡立方體。

這是一個容易很多傅立葉域可視化，因爲卷積變爲逐點相乘。對於尺寸的輸入圖像（100,100,3），則可以重寫濾波器的尺寸

filter = tf.Variable(tf.random_normal([100,100,3,7]))

爲了得到7個輸出特徵圖中的一個，我們簡單地進行過濾器立方體的逐點相乘圖像立方體，然後我們將結果彙總到通道/深度維度（此處爲3），摺疊爲2d（100,100）特徵地圖。對每個濾鏡立方體進行此操作，並獲得7個2D特徵地圖。

來源

2017-07-13 11:35:31 Val9265

我試圖實施conv2d（爲我的學習）。那麼，我寫道：

def conv(ix, w): 
    # filter shape: [filter_height, filter_width, in_channels, out_channels] 
    # flatten filters 
    filter_height = int(w.shape[0]) 
    filter_width = int(w.shape[1]) 
    in_channels = int(w.shape[2]) 
    out_channels = int(w.shape[3]) 
    ix_height = int(ix.shape[1]) 
    ix_width = int(ix.shape[2]) 
    ix_channels = int(ix.shape[3]) 
    filter_shape = [filter_height, filter_width, in_channels, out_channels] 
    flat_w = tf.reshape(w, [filter_height * filter_width * in_channels, out_channels]) 
    patches = tf.extract_image_patches(
     ix, 
     ksizes=[1, filter_height, filter_width, 1], 
     strides=[1, 1, 1, 1], 
     rates=[1, 1, 1, 1], 
     padding='SAME' 
    ) 
    patches_reshaped = tf.reshape(patches, [-1, ix_height, ix_width, filter_height * filter_width * ix_channels]) 
    feature_maps = [] 
    for i in range(out_channels): 
     feature_map = tf.reduce_sum(tf.multiply(flat_w[:, i], patches_reshaped), axis=3, keep_dims=True) 
     feature_maps.append(feature_map) 
    features = tf.concat(feature_maps, axis=3) 
    return features

希望我做得很好。檢查了MNIST，結果非常接近（但這種實現速度較慢）。我希望這可以幫助你。

來源

2017-09-27 18:32:08

tf.nn.conv2d在tensorflow中做什麼？

回答

相關問題