2017-07-13 62 views
0

我一直在試圖找出爲什麼這不工作幾個小時,但我無處可去。真的很感謝一些幫助。MNIST tensorflow - 不能找出最新錯誤

它基本上是在tensorflow網站上找到的教程的一個副本,它使用本地數據集進行了一些調整。但我只有10%的準確度,這與猜測一樣!

import numpy as np 
import pandas as pd 
from sklearn.model_selection import train_test_split 
import tensorflow as tf 

df = pd.read_csv('train.csv') 
yi = df['label'] 
df = df.drop('label',1) 

labels=[] 
for i in range(len(yi)): 
    #convert to one hot 
    label = [0,0,0,0,0,0,0,0,0,0] 
    label[yi[i]]= 1 
    labels.append(label) 

labels = np.array(labels) 
df = df.as_matrix() 

df_train, df_test, y_train, y_test = train_test_split(df,labels) 




x = tf.placeholder('float', [None, 784]) 
W = tf.Variable(tf.zeros([784, 10])) 
b = tf.Variable(tf.zeros([10])) 
y = tf.nn.softmax(tf.matmul(x, W) + b) 
y_ = tf.placeholder('float', [None, 10]) 

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1])) 
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy) 



sess = tf.Session() 

init = tf.global_variables_initializer() 
sess.run(init) 

def next_batch(num, data, labels): 

    #get batches for training 

    idx = np.arange(0 , len(data)) 
    np.random.shuffle(idx) 
    idx = idx[:num] 
    data_shuffle = [data[ i] for i in idx] 
    labels_shuffle = [labels[ i] for i in idx] 

    return np.asarray(data_shuffle), np.asarray(labels_shuffle) 

for _ in range(1000): 
    df_train0, y_train0 = next_batch(100, df_train, y_train) 
    sess.run(train_step, feed_dict={ x: df_train0, y_: y_train0}) 

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) 
accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float')) 
print(sess.run(accuracy, feed_dict={x:df_test, y_:y_test})) 
+0

您沒有使用任何隱藏層?這只是一個線性模型?你期望什麼準確度? –

+0

我預計至少有90個。我知道還有更多的圖層可以添加,但需要先讓它工作! – ElkanaTheGreat

+0

無法訪問您的培訓數據,無法運行此程序並嘗試進行調試。你可以上傳你正在運行的數據文件嗎? –

回答

1

您的問題是要初始化W和0,因此沒有梯度的修改和所有logits將是0

W = tf.Variable(tf.zeros([784, 10])) 

你應該爲了打破對稱隨機初始化。

W = tf.Variable(tf.random_normal([784, 10])) 

編輯:沒有必要隨機化作爲目標的Logit將打破對稱。儘管如此,如果存在隱藏層,這將是必要的。真正的問題似乎在於輸入的規模。除以255應解決問題。

+0

我認爲梯度是熵方程的梯度而不是迴歸。無論哪種方式,它沒有幫助。 – ElkanaTheGreat

+0

漸變反向傳播到W變量。例如,您還需要縮小並將輸入居中,例如除以255。 –

+0

@ManoloSantos如果使用零初始化權重是問題,爲什麼官方[MLIST爲ML初學者](https://www.tensorflow.org/get_started/mnist/beginners)教程狀態「由於我們要學習W和b,它們最初的內容並不重要。「? – Jarad

1

我不知道爲什麼這有助於提高準確性,所以如果任何人都可以給出更好的答案,請做!

我改變:

y = tf.nn.softmax(tf.matmul(x, W) + b) 
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1])) 

是:

y = tf.matmul(x, W) + b 
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)) 

完整代碼示例:

import numpy as np 
import pandas as pd 
from sklearn.model_selection import train_test_split 
from sklearn.preprocessing import MultiLabelBinarizer 
import tensorflow as tf 
from scipy.stats import entropy 

def next_batch(num, data, labels): 
    '''get batches for training''' 

    idx = np.arange(0 , len(data)) 
    np.random.shuffle(idx) 
    idx = idx[:num] 
    data_shuffle = [data[ i] for i in idx] 
    labels_shuffle = [labels[ i] for i in idx] 

    return np.asarray(data_shuffle), np.asarray(labels_shuffle) 

df = pd.read_csv('train.csv') 
df_X = df.iloc[:, 1:] 
df_y = df['label'] 

y_one_hot = MultiLabelBinarizer().fit_transform(df_y.values.reshape(-1, 1)) 

df_train, df_test, y_train, y_test = train_test_split(df_X.values, y_one_hot) 

x = tf.placeholder('float', [None, 784]) 
W = tf.Variable(tf.zeros([784, 10])) 
b = tf.Variable(tf.zeros([10])) 
y = tf.nn.softmax(tf.matmul(x, W) + b) 
y_ = tf.placeholder('float', [None, 10]) 

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1])) 
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy) 

sess = tf.Session() 

init = tf.global_variables_initializer() 
sess.run(init) 

for _ in range(1000): 
    df_train0, y_train0 = next_batch(100, df_train, y_train) 
    sess.run(train_step, feed_dict={ x: df_train0, y_: y_train0}) 

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) 
accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float')) 
print(sess.run(accuracy, feed_dict={x:df_test, y_:y_test})) 

產生的準確性:約0.88

+0

任何人都知道爲什麼這個工作? – ElkanaTheGreat

+1

原始版本的問題是'tf.log(big_number)'是'inf'。因此,這是不穩定的。第二個版本防止這種情況,縮小logits以避免這種不穩定。 –

+1

**更正**:實際上,爲了防止不穩定,第二個版本使用標識'log(e^x)== x'。 –