3

我有一些麻煩試圖建立一個多層感知器用於使用張量流的二進制分類。用張量流建立MLP用於二進制分類

我有一個非常大的數據集(大約1,5 * 10^6個例子),每個都有一個二進制(0/1)標籤和100個特徵。我需要做的是建立一個簡單的MLP,然後嘗試改變學習率和初始化模式來記錄結果(這是一個任務)。 雖然我得到了奇怪的結果,因爲我的MLP似乎早就陷入了一個低但不是很高的成本,並且從來沒有下過。由於學習速率相當低,所以成本幾乎立即上漲。我不知道是否問題在於我如何構建MLP(我做了幾次嘗試,發佈最後一個代碼)還是如果我在tensorflow實現中丟失了某些東西。

CODE

import tensorflow as tf 
import numpy as np 
import scipy.io 

# Import and transform dataset 
print("Importing dataset.") 
dataset = scipy.io.mmread('tfidf_tsvd.mtx') 

with open('labels.txt') as f: 
    all_labels = f.readlines() 

all_labels = np.asarray(all_labels) 
all_labels = all_labels.reshape((1498271,1)) 

# Split dataset into training (66%) and test (33%) set 
training_set = dataset[0:1000000] 
training_labels = all_labels[0:1000000] 
test_set  = dataset[1000000:1498272] 
test_labels  = all_labels[1000000:1498272] 

print("Dataset ready.") 

# Parameters 
learning_rate = 0.01 #argv 
mini_batch_size = 100 
training_epochs = 10000 
display_step = 500 

# Network Parameters 
n_hidden_1 = 64 # 1st hidden layer of neurons 
n_hidden_2 = 32 # 2nd hidden layer of neurons 
n_hidden_3 = 16 # 3rd hidden layer of neurons 
n_input  = 100 # number of features after LSA 

# Tensorflow Graph input 
x = tf.placeholder(tf.float64, shape=[None, n_input], name="x-data") 
y = tf.placeholder(tf.float64, shape=[None, 1], name="y-labels") 

print("Creating model.") 

# Create model 
def multilayer_perceptron(x, weights): 
    # First hidden layer with SIGMOID activation 
    layer_1 = tf.matmul(x, weights['h1']) 
    layer_1 = tf.nn.sigmoid(layer_1) 
    # Second hidden layer with SIGMOID activation 
    layer_2 = tf.matmul(layer_1, weights['h2']) 
    layer_2 = tf.nn.sigmoid(layer_2) 
    # Third hidden layer with SIGMOID activation 
    layer_3 = tf.matmul(layer_2, weights['h3']) 
    layer_3 = tf.nn.sigmoid(layer_3) 
    # Output layer with SIGMOID activation 
    out_layer = tf.matmul(layer_2, weights['out']) 
    return out_layer 

# Layer weights, should change them to see results 
weights = { 
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1], dtype=np.float64)),  
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2], dtype=np.float64)), 
    'h3': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3],dtype=np.float64)), 
    'out': tf.Variable(tf.random_normal([n_hidden_2, 1], dtype=np.float64)) 
} 

# Construct model 
pred = multilayer_perceptron(x, weights) 

# Define loss and optimizer 
cost = tf.nn.l2_loss(pred-y,name="squared_error_cost") 
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) 

# Initializing the variables 
init = tf.initialize_all_variables() 

print("Model ready.") 

# Launch the graph 
with tf.Session() as sess: 
    sess.run(init) 

    print("Starting Training.") 

    # Training cycle 
    for epoch in range(training_epochs): 
     #avg_cost = 0. 
     # minibatch loading 
     minibatch_x = training_set[mini_batch_size*epoch:mini_batch_size*(epoch+1)] 
     minibatch_y = training_labels[mini_batch_size*epoch:mini_batch_size*(epoch+1)] 
     # Run optimization op (backprop) and cost op 
     _, c = sess.run([optimizer, cost], feed_dict={x: minibatch_x, y: minibatch_y}) 

     # Compute average loss 
     avg_cost = c/(minibatch_x.shape[0]) 

     # Display logs per epoch 
     if (epoch) % display_step == 0: 
     print("Epoch:", '%05d' % (epoch), "Training error=", "{:.9f}".format(avg_cost)) 

    print("Optimization Finished!") 

    # Test model 
    # Calculate accuracy 
    test_error = tf.nn.l2_loss(pred-y,name="squared_error_test_cost")/test_set.shape[0] 
    print("Test Error:", test_error.eval({x: test_set, y: test_labels})) 

輸出

python nn.py 
Importing dataset. 
Dataset ready. 
Creating model. 
Model ready. 
Starting Training. 
Epoch: 00000 Training error= 0.331874878 
Epoch: 00500 Training error= 0.121587482 
Epoch: 01000 Training error= 0.112870921 
Epoch: 01500 Training error= 0.110293652 
Epoch: 02000 Training error= 0.122655269 
Epoch: 02500 Training error= 0.124971940 
Epoch: 03000 Training error= 0.125407845 
Epoch: 03500 Training error= 0.131942481 
Epoch: 04000 Training error= 0.121696954 
Epoch: 04500 Training error= 0.116669835 
Epoch: 05000 Training error= 0.129558477 
Epoch: 05500 Training error= 0.122952110 
Epoch: 06000 Training error= 0.124655344 
Epoch: 06500 Training error= 0.119827300 
Epoch: 07000 Training error= 0.125183779 
Epoch: 07500 Training error= 0.156429254 
Epoch: 08000 Training error= 0.085632880 
Epoch: 08500 Training error= 0.133913128 
Epoch: 09000 Training error= 0.114762624 
Epoch: 09500 Training error= 0.115107805 
Optimization Finished! 
Test Error: 0.116647016708 

這是MMN建議

weights = { 
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1], stddev=0, dtype=np.float64)),  
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2], stddev=0.01, dtype=np.float64)), 
    'h3': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3], stddev=0.01, dtype=np.float64)), 
    'out': tf.Variable(tf.random_normal([n_hidden_2, 1], dtype=np.float64)) 
} 

這是輸出

Epoch: 00000 Training error= 0.107566668 
Epoch: 00500 Training error= 0.289380907 
Epoch: 01000 Training error= 0.339091784 
Epoch: 01500 Training error= 0.358559815 
Epoch: 02000 Training error= 0.122639698 
Epoch: 02500 Training error= 0.125160135 
Epoch: 03000 Training error= 0.126219718 
Epoch: 03500 Training error= 0.132500418 
Epoch: 04000 Training error= 0.121795254 
Epoch: 04500 Training error= 0.116499476 
Epoch: 05000 Training error= 0.124532673 
Epoch: 05500 Training error= 0.124484790 
Epoch: 06000 Training error= 0.118491177 
Epoch: 06500 Training error= 0.119977633 
Epoch: 07000 Training error= 0.127532511 
Epoch: 07500 Training error= 0.159053519 
Epoch: 08000 Training error= 0.083876224 
Epoch: 08500 Training error= 0.131488483 
Epoch: 09000 Training error= 0.123161189 
Epoch: 09500 Training error= 0.125011362 
Optimization Finished! 
Test Error: 0.129284643093 

相連的第三隱藏層,由於MMN

有在我的代碼錯誤,我有兩個隱含層,而不是三個。我糾正這樣做的:

'out': tf.Variable(tf.random_normal([n_hidden_3, 1], dtype=np.float64)) 

out_layer = tf.matmul(layer_3, weights['out']) 

我回到了STDDEV舊值雖然,因爲它似乎導致成本函數的變動小。

輸出仍然困擾

Epoch: 00000 Training error= 0.477673073 
Epoch: 00500 Training error= 0.121848744 
Epoch: 01000 Training error= 0.112854530 
Epoch: 01500 Training error= 0.110597624 
Epoch: 02000 Training error= 0.122603499 
Epoch: 02500 Training error= 0.125051472 
Epoch: 03000 Training error= 0.125400717 
Epoch: 03500 Training error= 0.131999354 
Epoch: 04000 Training error= 0.121850889 
Epoch: 04500 Training error= 0.116551533 
Epoch: 05000 Training error= 0.129749704 
Epoch: 05500 Training error= 0.124600464 
Epoch: 06000 Training error= 0.121600218 
Epoch: 06500 Training error= 0.121249676 
Epoch: 07000 Training error= 0.132656938 
Epoch: 07500 Training error= 0.161801757 
Epoch: 08000 Training error= 0.084197352 
Epoch: 08500 Training error= 0.132197409 
Epoch: 09000 Training error= 0.123249055 
Epoch: 09500 Training error= 0.126602369 
Optimization Finished! 
Test Error: 0.129230736355 

兩個更感謝史蒂芬 變化,使史蒂芬建議改變乙狀結腸激活功能與RELU,所以我試過了。同時,我注意到我沒有爲輸出節點設置激活函數,所以我也這樣做了(應該很容易看出我改變了什麼)。

Starting Training. 
Epoch: 00000 Training error= 293.245977809 
Epoch: 00500 Training error= 0.290000000 
Epoch: 01000 Training error= 0.340000000 
Epoch: 01500 Training error= 0.360000000 
Epoch: 02000 Training error= 0.285000000 
Epoch: 02500 Training error= 0.250000000 
Epoch: 03000 Training error= 0.245000000 
Epoch: 03500 Training error= 0.260000000 
Epoch: 04000 Training error= 0.290000000 
Epoch: 04500 Training error= 0.315000000 
Epoch: 05000 Training error= 0.285000000 
Epoch: 05500 Training error= 0.265000000 
Epoch: 06000 Training error= 0.340000000 
Epoch: 06500 Training error= 0.180000000 
Epoch: 07000 Training error= 0.370000000 
Epoch: 07500 Training error= 0.175000000 
Epoch: 08000 Training error= 0.105000000 
Epoch: 08500 Training error= 0.295000000 
Epoch: 09000 Training error= 0.280000000 
Epoch: 09500 Training error= 0.285000000 
Optimization Finished! 
Test Error: 0.220196439287 

這是它與每個節點上乙狀結腸激活功能呢,輸出包括

Epoch: 00000 Training error= 0.110878121 
Epoch: 00500 Training error= 0.119393080 
Epoch: 01000 Training error= 0.109229532 
Epoch: 01500 Training error= 0.100436962 
Epoch: 02000 Training error= 0.113160662 
Epoch: 02500 Training error= 0.114200962 
Epoch: 03000 Training error= 0.109777990 
Epoch: 03500 Training error= 0.108218725 
Epoch: 04000 Training error= 0.103001394 
Epoch: 04500 Training error= 0.084145737 
Epoch: 05000 Training error= 0.119173495 
Epoch: 05500 Training error= 0.095796251 
Epoch: 06000 Training error= 0.093336573 
Epoch: 06500 Training error= 0.085062860 
Epoch: 07000 Training error= 0.104251661 
Epoch: 07500 Training error= 0.105910949 
Epoch: 08000 Training error= 0.090347288 
Epoch: 08500 Training error= 0.124480612 
Epoch: 09000 Training error= 0.109250224 
Epoch: 09500 Training error= 0.100245836 
Optimization Finished! 
Test Error: 0.110234139674 

我發現這些數字很奇怪,在第一種情況下,它是停留在一個較高成本比乙狀結腸,儘管乙狀結腸應該很早就飽和。在第二種情況下,它從幾乎是最後一個的訓練錯誤開始......所以它基本上與一個小批量收斂。我開始認爲我沒有在這一行中正確計算成本: avg_cost = c /(minibatch_x。形狀[0])

+0

您是否嘗試將您的行'cost = tf.nn.l2_loss(pred-y,name =「squared_error_cost」)'改爲'cost = tf.nn.square(tf.sub(pred,y))? – Kashyap

+0

您可以在培訓過程中打印準確度(正確分類樣本的百分比)嗎? –

+0

@Kashyap:打印成本時,我得到一個「非空格式字符串傳遞給object .__ format__」錯誤,而且我似乎無法解決這個問題。 – Darkobra

回答

1

所以它可能是一兩件事情:

  1. 你可能會飽和乙狀結腸單位(如MMN提到的)我會建議嘗試RELU單位來代替。

取代:

tf.nn.sigmoid(layer_n) 

有:

tf.nn.relu(layer_n) 
  • 您的模型可能沒有足夠的表現力真正瞭解你的數據。即它需要更深入。
  • 您也可以嘗試不同的優化像亞當()這樣
  • 取代:

    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) 
    

    有:

    optimizer = tf.train.AdamOptimizer().minimize(cost) 
    

    其他一些要點:

    1. 您應該添加偏置項到您的權重

    像這樣:

    biases = { 
    'b1': tf.Variable(tf.random_normal([n_hidden_1], dtype=np.float64)),  
    'b2': tf.Variable(tf.random_normal([n_hidden_2], dtype=np.float64)), 
    'b3': tf.Variable(tf.random_normal([n_hidden_3],dtype=np.float64)), 
    'bout': tf.Variable(tf.random_normal([1], dtype=np.float64)) 
    } 
    
    def multilayer_perceptron(x, weights): 
        # First hidden layer with SIGMOID activation 
        layer_1 = tf.matmul(x, weights['h1']) + biases['b1'] 
        layer_1 = tf.nn.sigmoid(layer_1) 
        # Second hidden layer with SIGMOID activation 
        layer_2 = tf.matmul(layer_1, weights['h2']) + biases['b2'] 
        layer_2 = tf.nn.sigmoid(layer_2) 
        # Third hidden layer with SIGMOID activation 
        layer_3 = tf.matmul(layer_2, weights['h3']) + biases['b3'] 
        layer_3 = tf.nn.sigmoid(layer_3) 
        # Output layer with SIGMOID activation 
        out_layer = tf.matmul(layer_2, weights['out']) + biases['bout'] 
        return out_layer 
    
  • ,你可以隨着時間的推移更新學習率
  • 像這樣:

    learning_rate = tf.train.exponential_decay(INITIAL_LEARNING_RATE, 
                  global_step, 
                  decay_steps, 
                  LEARNING_RATE_DECAY_FACTOR, 
                  staircase=True) 
    

    你只需要定義dec ay步驟即何時衰減和LEARNING_RATE_DECAY_FACTOR即衰減多少。

    +0

    我已經用你的建議編輯了答案。注意到: 1. relu給出非常奇怪的值,你可以在編輯問題時閱讀它。 2.我將模型做得更深一些,因爲之前有2個隱藏層,由於我的錯誤,現在它有3個隱藏層。 3.我真的不能使用Adam優化器,因爲它會違揹我的任務的目的,這是爲了發揮學習速度和一些初始化參數。 您是否認爲我在每次mini_batch之後都正確計算成本? – Darkobra

    +0

    有不同的成本函數,所以它真的取決於你的任務。我無法真正回答這個問題,因爲如果不知道這個任務是說l2損失是正確的還是交叉熵或其他東西。你正在使用l2丟失。 – Steven

    +0

    另一個簡單的事情是「明顯的」,但有時不被注意,確保您的標籤符合正確的訓練輸入。 – Steven

    1

    您的權重初始化時stddev爲1,因此第1層的輸出將具有10左右的stddev。這可能會使sigmoid函數達到大多數梯度爲0的點。

    您可以嘗試初始化stddev爲.01的隱藏權重嗎?

    +0

    看起來這 00000 Tr的ERR = 0.107566 00500 Tr的ERR = 0.289380 01000 Tr的ERR = 0.339091 01500 Tr的ERR = 0.358559 02000 Tr的ERR = 0.122639 02500 Tr的ERR = 0.125160 03000 Tr的ERR = 0.126219 03500 Tr的ERR = 0.132500 04000 Tr的ERR = 0.121795 04500 Tr的ERR = 0.116499 05000 Tr的ERR = 0.124532 05500 Tr的ERR = 0.124484 06000 Tr的ERR = 0.118491 06500 Tr的ERR = 0.119977 07000 Tr的ERR = 0.127532 07500 Tr的ERR = 0.159053 08000 Tr err = 0.083876 08500 Tr的ERR = 0.131488 09000 Tr的ERR = 0.123161 09500 Tr的ERR = 0.125011 特錯誤:0.129284643 – Darkobra

    +0

    嗯,不能以意見給予適當的形狀,但我可以告訴你,這並沒有解決我的問題。 – Darkobra

    +0

    嗯,也許這是最好的你將得到一個雙層網絡?你的意思是不使用h3嗎? – MMN

    1

    除了上面的答案,我會建議你試圖成本函數tf.nn.sigmoid_cross_entropy_with_logits(logits,目標,名稱=無)

    由於二元分類,你必須嘗試的sigmoid_cross_entropy_with_logits成本功能

    我還建議你還必須繪製列車和測試的準確性與時代的數量的線圖。即檢查模型是否過度配合?

    如果不是過度擬合,儘量讓你的神經網絡更復雜。這是通過增加神經元的數量,增加層數。你會得到這樣一個點,除此之外,你的訓練精度會持續增加,但驗證不會達到最佳模型。

    +0

    嘿Pramod,謝謝你的回覆。我正在閱讀你提到的這個成本函數,但是描述表明它最適合於標籤不相互排斥的地方 - 但在我的模型中它們是。我現在在TensorBoard的幫助下調整我的網絡,我一定會努力讓自己的網絡更加複雜。 – Darkobra

    +0

    按照問題「我有一個非常大的數據集(大約1,5 * 10^6個例子),每個都有一個二進制(0/1)標籤」。它是二元類分類,每個實例都是真(1)或假(0)。你相互排斥是什麼意思?我無法得到它。 –

    +0

    我想你是在談論這個問題「衡量離散分類任務中的概率錯誤,其中每個類都是獨立的,而不是相互排斥的。」我認爲根據你的描述你的標籤不是相互排斥和獨立的。看看這個:http://stats.stackexchange。com/questions/107768 /多重標籤與多種分類之間的差異 –