Relu表現比sigmoid更差？

我對所有圖層和輸出使用sigmoid，並得到0.00012的最終錯誤率，但是當我使用Relu時，理論上更好，我得到可能的最差結果。任何人都可以解釋爲什麼會發生？我使用的網站100可用一個非常簡單的2層實現的代碼，但仍然很下面給它，Relu表現比sigmoid更差？

import numpy as np 
#test 
#avg(nonlin(np.dot(nonlin(np.dot([0,0,1],syn0)),syn1))) 
#returns list >> [predicted_output, confidence] 
def nonlin(x,deriv=False):#Sigmoid 
    if(deriv==True): 
     return x*(1-x) 

    return 1/(1+np.exp(-x)) 

def relu(x, deriv=False):#RELU 
    if (deriv == True): 
     for i in range(0, len(x)): 
      for k in range(len(x[i])): 
       if x[i][k] > 0: 
        x[i][k] = 1 
       else: 
        x[i][k] = 0 
     return x 
    for i in range(0, len(x)): 
     for k in range(0, len(x[i])): 
      if x[i][k] > 0: 
       pass # do nothing since it would be effectively replacing x with x 
      else: 
       x[i][k] = 0 
    return x 

X = np.array([[0,0,1], 
      [0,0,0], 
      [0,1,1], 
      [1,0,1], 
      [1,0,0], 
      [0,1,0]]) 

y = np.array([[0],[1],[0],[0],[1],[1]]) 

np.random.seed(1) 

# randomly initialize our weights with mean 0 
syn0 = 2*np.random.random((3,4)) - 1 
syn1 = 2*np.random.random((4,1)) - 1 

def avg(i): 
     if i > 0.5: 
      confidence = i 
      return [1,float(confidence)] 
     else: 
      confidence=1.0-float(i) 
      return [0,confidence] 
for j in xrange(500000): 

    # Feed forward through layers 0, 1, and 2 
    l0 = X 
    l1 = nonlin(np.dot(l0,syn0Performing)) 
    l2 = nonlin(np.dot(l1,syn1)) 
    #print 'this is',l2,'\n' 
    # how much did we miss the target value? 
    l2_error = y - l2 
    #print l2_error,'\n' 
    if (j% 100000) == 0: 
     print "Error:" + str(np.mean(np.abs(l2_error))) 
     print syn1 

    # in what direction is the target value? 
    # were we really sure? if so, don't change too much. 
    l2_delta = l2_error*nonlin(l2,deriv=True) 

    # how much did each l1 value contribute to the l2 error (according to the weights)? 
    l1_error = l2_delta.dot(syn1.T) 

    # in what direction is the target l1? 
    # were we really sure? if so, don't change too much. 
    l1_delta = l1_error * nonlin(l1,deriv=True) 

    syn1 += l1.T.dot(l2_delta) 
    syn0 += l0.T.dot(l1_delta) 
print "Final Error:" + str(np.mean(np.abs(l2_error))) 
def p(l): 
     return avg(nonlin(np.dot(nonlin(np.dot(l,syn0)),syn1)))

因此P（x）是教育訓練，其中x是一個1×3矩陣後的預測中功能輸入值。

來源

2017-06-04 Ubdus Samad

可能的結果是什麼？ –

如果您想要更詳細的回覆，請使用ReLU發佈代碼。 –

你爲什麼說理論上更好？在大多數應用中，ReLU已被證明更好，但並不意味着它通用性更好。你的例子非常簡單，輸入在[0,1]之間縮放，與輸出相同。這正是我希望sigmoid表現良好的地方。由於漸變問題消失以及大型網絡中的其他問題，您在實踐中不會遇到隱藏層中的S形，但這對您來說不是問題。

此外，如果您有任何機會使用ReLU衍生物，您在代碼中缺少'其他'。你的派生將被簡單覆蓋。

就像複習，這裏的RELU的定義：

F（X）= MAX（0，x）的

...這意味着它可以吹你激活無限。你想避免在最後（輸出）層上有ReLU。

在一個側面說明，只要有可能，你應該採取矢量化操作的優勢：

def relu(x, deriv=False):#RELU 
    if (deriv == True): 
     mask = x > 0 
     x[mask] = 1 
     x[~mask] = 0 
    else: # HERE YOU WERE MISSING "ELSE" 
     return np.maximum(0,x)

是的，這是多更快然後if/else語句，你在幹什麼。

來源

2017-06-04 10:28:43

np.maximum（0,1）????這將是一個每次，並感謝功能的更新，它將我的錯誤率降低到很高的程度，但它仍然是遠離sigmoid。 –

我試圖在所有圖層中使用relu（甚至是最後一個），並且都排除了最後一層，但仍然得到0.1％的錯誤率，即10％的錯誤！ –

感謝您指出 - 現在修復。我將在稍後運行您的代碼。在確定ReLU上的衍生物計算之後，您是否再次運行代碼？ –

Relu表現比sigmoid更差？

回答

相關問題