什麼讓logits出現這種意想不到的形狀？

我目前正在用TensorFlow的Python API開發音頻分類器，使用UrbanSound8K數據集，從每個文件中精確收集176400個數據點，並試圖區分10個互斥類。什麼讓logits出現這種意想不到的形狀？

我已經適應了卷積神經網絡這個例子代碼： https://www.tensorflow.org/get_started/mnist/pros

不幸的是，我收到以下錯誤：

Traceback (most recent call last): 
    ... 
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [7000,10] and labels shape [10] 
    [[Node: xent/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"](read/add, _recv_y_0/_9)]] 

During handling of the above exception, another exception occurred: 

Traceback (most recent call last): 
    File "urban-cnn.py", line 124, in <module> 
    sess.run(optimizer, feed_dict={x: batch_x, y: batch_y, keep_prob: .5}) 
    ... 
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [7000,10] and labels shape [10] 
    [[Node: xent/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"](read/add, _recv_y_0/_9)]] 

Caused by op 'xent/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits', defined at: 
    File "urban-cnn.py", line 102, in <module> 
    xent = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=y_conv), name="xent") 
    ... 

InvalidArgumentError (see above for traceback): logits and labels must have the same first dimension, got logits shape [7000,10] and labels shape [10] 
    [[Node: xent/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"](read/add, _recv_y_0/_9)]]

下面是代碼的有一點不同的版本：

import tensorflow as tf 
import soundfile as sfx 
import numpy as np 
import math 
import glob 

batch_size = 10 
n_epochs = 10 

input_width = 176400 

n_labels = 10 

widths = [5, 5, 7] 
channels = [1, 8, 64, 512, n_labels] 

learning_rate = 1e-4 

def load_data(): 
    data_x = [] 
    data_y = [] 

    for path in glob.glob("./UrbanSound8K/audio/fold1/*.wav"): 
     name = path.split("/")[-1].split(".")[0] 
     x, sample_rate = sfx.read(path, frames=input_width, fill_value=0.) 
     y = int(name.split("-")[1]) 

     if x.ndim > 1: 
      x = x.take(0, axis=1) 

     data_x.append(x) 
     data_y.append(y) 

    return data_x, data_y 

data_x, data_y = load_data() 
data_split = int(len(data_x) * .9) 

train_x = data_x[:data_split] 
train_y = data_y[:data_split] 

test_x = data_x[data_split:] 
test_y = data_y[data_split:] 

x = tf.placeholder(tf.float32, [None, input_width], name="x") 
y = tf.placeholder(tf.int64, [None], name="y") 

x_reshaped = tf.reshape(x, [-1, 1, input_width, channels[0]], name="x_reshaped") 

def weights_x(shape, name): 
    w = tf.Variable(tf.truncated_normal(shape, stddev=0.1), name=name) 
    tf.summary.histogram("weights", w) 
    return w 

def weights(layer, name): 
    return weights_x([1, widths[layer], channels[layer], channels[layer+1]], name) 

def biases(layer, name): 
    b = tf.Variable(tf.constant(0.1, shape=[channels[layer+1]]), name=name) 
    tf.summary.histogram("biases", b) 
    return b 

def convolution(p, w, b, name): 
    c = tf.nn.relu(tf.nn.conv2d(p, w, strides=[1, 1, 1, 1], padding="SAME") + b, name=name) 
    tf.summary.histogram("convolution", c) 
    return c 

def pooling(c, name): 
    p = tf.nn.max_pool(c, ksize=[1, 1, 6, 1], strides=[1, 1, 6, 1], padding="SAME", name=name) 
    tf.summary.histogram("pooling", p) 
    return p 

with tf.name_scope("conv1"): 
    w1 = weights(0, "w1") 
    b1 = biases(0, "b1") 
    c1 = convolution(x_reshaped, w1, b1, "c1") 
    p1 = pooling(c1, "p1") 

with tf.name_scope("conv2"): 
    w2 = weights(1, "w2") 
    b2 = biases(1, "b2") 
    c2 = convolution(p1, w2, b2, "c2") 
    p2 = pooling(c2, "p2") 

with tf.name_scope("dens"): 
    n_edges = widths[2] * channels[2] 
    wf1 = weights_x([n_edges, channels[3]], "wf1") 
    bf1 = biases(2, "bf1") 
    pf1 = tf.reshape(p2, [-1, n_edges], name="pf1") 
    f1 = tf.nn.relu(tf.matmul(pf1, wf1) + bf1, name="f1") 

with tf.name_scope("drop"): 
    keep_prob = tf.placeholder(tf.float32, name="keep_prob") 
    dropout = tf.nn.dropout(f1, keep_prob) 

with tf.name_scope("read"): 
    wf2 = weights_x([channels[3], channels[4]], "wf2") 
    bf2 = biases(3, "bf2") 
    y_conv = tf.matmul(dropout, wf2) + bf2 

with tf.name_scope("xent"): 
    xent = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=y_conv), name="xent") 
    tf.summary.scalar("xent", xent) 

with tf.name_scope("optimizer"): 
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(xent) 

with tf.name_scope("accuracy"): 
    correct_prediction = tf.equal(tf.argmax(y_conv, 1), y, name="correct_prediction") 
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy") 
    tf.summary.scalar("accuracy", accuracy) 

with tf.Session() as sess: 
    sess.run(tf.global_variables_initializer()) 
    print("Initialized Global Variables") 

    for epoch in range(n_epochs): 
     n_itr = len(train_x)//batch_size 

     for itr in range(n_itr): 
      left, right = itr*batch_size, (itr+1)*batch_size 
      batch_x, batch_y = train_x[left:right], train_y[left:right] 

      sess.run(optimizer, feed_dict={x: batch_x, y: batch_y, keep_prob: .5}) 
     print("epoch: ", epoch + 1) 

    print("accuracy: ", sess.run(accuracy, feed_dict={x: test_x, y: test_y, keep_prob: 1.}))

在調用sess.run（...）之前檢查張量形狀時，一切都如預期。

那麼爲什麼logits的形狀是[7000，n_labels]而不是[batch_size，n_labels]？

來源

2017-02-19 Androbin

這不是[7000，n_classes]，而是[7000，batch_size]（或[7000，last_channel_size]），因爲在代碼中有**一個**類，10是批量大小，而不是n_classes。總的來說，你似乎以非常奇怪的方式保存數據。 – lejlot

@lejlot抱歉，在代碼中它實際上被稱爲n_labels（它是10） – Androbin

@lejlot它不是一個熱點編碼，如果你的意思是說，我正在使用**稀疏_ ** softmax_cross_entropy_with_logits – Androbin

你的網絡已經不正確結構，關鍵的問題是在這裏

with tf.name_scope("dens"): 
    n_edges = widths[2] * channels[2] 
    wf1 = weights_x([n_edges, channels[3]], "wf1") 
    bf1 = biases(2, "bf1") 
    pf1 = tf.reshape(p2, [-1, n_edges], name="pf1") 
    f1 = tf.nn.relu(tf.matmul(pf1, wf1) + bf1, name="f1")

P2具有形狀[10，1，4900，64]和n_edges不等於4900×64 = 313600，而是它448（方式太小層！），如果你讓n_edges = 313600一切都很好，但是這取決於你是否想到了這個架構。它看起來像你合併了兩個不兼容的東西，你使用卷積核的形狀來計算圖層的大小來平滑它。然而，這不是卷積的工作原理 - 圖層的形狀取決於輸入，內核和填充的大小。因此，一般來說，這是方式更大，並在此例中 - 完全連接層實際上應該有超過300k輸入神經元，而不是在你的代碼 - 只有448。這裏至關重要的區別是，這個完全連接層工作在輸出的卷積，而不是參數。

這個7000只是運行batch_size *（4900 * 64）/（n_edges）= 10 * 313600/448 = 7000（pf1重塑）的結果。

的更通用的解決方法是做

p2s = p2.get_shape() 
n_edges = int(p2s[1] * p2s[2] * p2s[3])

因爲

在這點p2的所有形狀（除了第0）因此公知的可以讀取和用於網絡的其餘部分的結構。

來源

2017-02-19 17:26:22 lejlot

後續行動：您有任何關於如何減少一個不必要的大CNN（我因爲內存錯誤而提出要求...） – Androbin

cnn通常不需要太多的內存（直到存儲激活），你確定它不是由前饋層引起的嗎？它有313600 * 512〜= 160M的參數。如果這是關於CNN部分，那麼增加池中的間距以降低數據分辨率（或者減少內核數量，特別是這512個看起來很多）。 – lejlot

什麼讓logits出現這種意想不到的形狀？

回答

相關問題