Q深度學習算法不起作用

一直試圖實現一個Q深度學習算法，雖然它在100 000次遊戲後使用不起作用，並且使用1000次迭代訓練每個步驟（儘管我已經嘗試了較低的數字）它仍然沒有學習。網絡和遊戲是鏈接的圖像中，http://imgur.com/a/hATfB這裏是在每個訓練步驟發生的情況：爲backpropQ深度學習算法不起作用

double maxQval; 
double[] inputvec; 
int MaxQ = GetRandDir(state, out maxQval, out inputvec);//input vec is board 
double[] QtarVec = new double[] { 0, 0, 0, 0 }; 
double r = GetR((int)state[0], (int)state[1]); // GetR is reward 
QtarVec[MaxQ] = Qtar(r, maxQval); // backprop vector of 0's except Qtar replaces a value 

associator.Train(50, new double[][] { inputvec }, new double[][] { QtarVec });

訓練數據對是（入端i在圖像鏈接，QTarget = R +伽馬* MAXQ），MAXQ是最大網絡輸出層激活或隨機一個（ε貪婪）。 r是每次移動獲得的獎勵，-10爲障礙，10爲目標。（althogh我剛剛10個目標和0的一切嘗試。這裏是訓練碼。

public void Train(int nTrails) 
{ 
    double[] state = new double[] { 1, 1 }; // inital position 
    int its = 0; 
    for (int i = 0; i < nTrails; i++) 
    { 
     while (((state[0] < 4) && (state[1] < 4))&&((state[0] * 100 >0) && (state[1] * 100 >0)) && (state[0] != 3 && state[1] != 3))//while on board and not at goal  postion 
     { 
      double temp = r.NextDouble(); 
      int next = -1; 
      lines.Add(new Vector2((float)(state[0] * 100), (float)(state[1] * 100))); 
      if (temp < epsilon) 
      { 
       next = TrainRandIt(state); // move random direction, backprop 
      } 
      else 
      { 
       next = TrainMaxIt(state); // move in max activation direction, backprop 
      } 
      if (next == 0) .//updating postion 
      { 
       state[0]++; 
      } 
      else if (next == 1) 
      { 
       state[0]--; 
      } 
      else if (next == 2) 
      { 
       state[1]++; 
      } 
      else if (next == 3) 
      { 
       state[1]--; 
      } 
     } 
    } 
    state[0] = 1; 
    state[1] = 1; // resetting game 

}

任何幫助表示讚賞。

來源

2016-12-09 Sam Smith

縮進使讀取代碼變得容易。 –

謝謝你 –

從您提供的鏈接的圖像來看，它就像一個迷宮遊戲在那裏你有玩家位置的輸入和輸出作爲玩家應該移動到的方向（上，下，左或右）

這裏是一個機器學習引擎，它能夠準確地解決這個問題 - （RLM）。與您可能嘗試過的典型機器學習引擎相比，RLM有不同的方法所以我建議你到我提供的鏈接去了解更多關於它的內容，以及它的不同之處。

它是用C＃編寫的，我們有一個迷宮遊戲的例子，就像您嘗試使用的遊戲一樣，您可以通過我們的Github page瀏覽，甚至可以通過將示例應用程序克隆/下載源代碼提供。

有關文檔，可參考Documentations files提供或甚至通過github wiki。

RLM也可通過Nuget獲得。

來源

2017-04-18 07:18:12 Randolph

Q深度學習算法不起作用

回答

相關問題