在Python中使用神經網絡進行光學字符識別

該代碼用於使用ANN的OCR，它包含一個隱藏層，輸入是一個大小爲28x28的圖像。代碼運行時沒有任何錯誤，但輸出結果仍然不準確爲訓練提供超過5000張圖像。我使用的是jpg圖像格式的mnist數據集。請告訴我我的邏輯有什麼問題。在Python中使用神經網絡進行光學字符識別

  import numpy as np 
      from PIL import Image 
      import random 
      from random import randint 
      y = [[0,0,0,0,0,0,0,0,0,0]] 
      W1 = [[ random.uniform(-1, 1) for q in range(40)] for p in range(784)] 
      W2 = [[ random.uniform(-1, 1) for q in range(10)] for p in range(40)] 
      def sigmoid(x): 
       global b 
       return (1.0/(1.0 + np.exp(-x))) 

      #run the neural net forward 

      def run(X, W): 
       return sigmoid(np.matmul(X,W)) #1x2 * 2x2 = 1x1 matrix 

      #cost function 

      def cost(X, y, W): 
       nn_output = run(X, W) 
       return ((nn_output - y)) 

      def gradient_Descent(X,y,W1,W2): 
       alpha = 0.12 #learning rate 
       epochs = 15000 #num iterations 
       for i in range(epochs): 
        Z2=sigmoid(np.matmul(run(X,W1),W2)) #final activation function(1X10)) 
        Z1=run(X,W1) #first activation function(1X40) 
        phi1=Z1*(1-Z1) #differentiation of Z1 
        phi2=Z2*(1-Z2) #differentiation of Z2 
        delta2 = phi2*cost(Z1,y,W2) #delta for outer layer(1X10) 
        delta1 = np.transpose(np.transpose(phi1)*np.matmul(W2,np.transpose(delta2))) 
        deltaW2 = alpha*(np.matmul(np.transpose(Z1),delta2)) 
        deltaW1 = alpha*(np.matmul(np.transpose(X),delta1)) 
        W1=W1+deltaW1 
        W2=W2+deltaW2 

      def Training(): 

       for j in range(8): 
        y[0][j]=1 
        k=1 
        while k<=15: #5421 
         print(k) 
         q=0 
         img = Image.open('mnist_jpgfiles/train/mnist_'+str(j)+'_'+str(k)+'.jpg') 
         iar = np.array(img)  #image array 
         ar=np.reshape(iar,(1,np.product(iar.shape))) 
         ar=np.array(ar,dtype=float) 
         X = ar 
         ''' 
         for p in range(784): 

          if X[0][p]>0: 

           X[0][p]=1 

          else: 

           X[0][p]=0 
         '''   
         k+=1 
         gradient_Descent(X,y,W1,W2) 
         print(np.argmin(cost(run(X,W1),y,W2))) 
         #print(W1) 

        y[0][j]=0 
      Training() 

      def test(): 
       global W1,W2 
       for j in range(3): 
        k=1 
        while k<=5: #890 
         img = Image.open('mnist_jpgfiles/test/mnist_'+str(j)+'_'+str(k)+'.jpg') 
         iar = np.array(img)  #image array 
         ar=np.reshape(iar,(1,np.product(iar.shape))) 
         ar=np.array(ar,dtype=float) 
         X = ar/256 
         ''' 
         for p in range(784): 

          if X[0][p]>0: 

           X[0][p]=1 

          else: 

           X[0][p]=0 
         '''  
         k+=1 
         print("Should be "+str(j)) 
         print((run(run(X,W1),W2))) 
         print((np.argmax(run(run(X,W1),W2)))) 
      print("Testing.....") 
      test()

來源

2017-05-31 Yugank Trivedi

我還沒有與ANN的工作，但在安德魯·納格整機採用梯度下降算法工作迴歸的問題，像在coursera學習課程時，我發現它是有幫助的學習率的α小於0.05，並且無迭代超過100000. 嘗試調整你的學習速度，然後創建一個混淆矩陣，這將幫助你瞭解系統的準確性。

來源

2017-05-31 02:51:18

根據我的經驗，人工神經網絡有很多可能出錯的地方。我會列出一些可能的錯誤供您考慮。

假設訓練後分類精度完全沒有增加。

訓練或測試集有些問題。
太高學習率有時會導致算法不收斂在全部。嘗試將其設置得非常小，例如0.01或0.001。如果仍然沒有收斂。這個問題可能與梯度下降之外的其他事情有關。

假設培訓確實增加了，但準確性比預期的要差。

標準化過程未正確實施。對於圖像，建議使用零均值單位差異。
學習率過低或過高

來源

2017-05-31 03:38:54

沒有您的成本函數的一個問題，因爲你根本計算假設輸出之間的不同與實際output.It讓你的成本函數變成直線，所以它嚴格增加（或嚴格減少），不能優化。您需要製作交叉熵成本函數（因爲您使用sigmoid作爲激活函數）。此外，梯度下降根本無法優化ANN成本函數，您應該使用梯度下降的反向傳播來優化它。

來源

2017-05-31 04:28:57

作爲你建議我用0.5 * np.square（nn_output-y）作爲成本函數。現在運行我得到了一個很好的降低成本由於梯度下降，但沒有得到正確的輸出爲np.argmax（運行（運行（X，W1），W2））給出一個不同的輸出，其成本較高.... [[3.370e-05 4.999e-01 4.999e-01 4.999e-01 4.999e-01 4.999e-01 4.999e-01 4.999e-01 4.999e-01 4.999e-01]] [[0.99178966 0.99999857 0.99999872 0.99999861 0.99999876 0.99999857 0.99999842 0.99999868 0.99999857 0.99999874] –

以上，成本輸出爲： - [[3.370 E-05 4.999ë -01 4.999 e-01 4.999 e-01 4.999 e-01 4.999 e-01 4.999 e-01 4.999 e-01 4.999 e-01 4.999 e-01]]和運行輸出爲： - [[0.99178966 0.99999857 0.99999872 0.99999861 0.99999876 0.99999857 0.99999842 0.99999868 0.99999857 0.99999874]] –

在Python中使用神經網絡進行光學字符識別

回答

相關問題