強化學習 - 從餘暉TD學習

我正在製作一個程序，教導2名玩家使用強化學習和基於餘暉的時間差分學習方法（TD（λ））玩簡單的棋盤遊戲。學習是通過訓練神經網絡來實現的。我用Sutton's NonLinear TD/Backprop neural network）我真的很喜歡你對我下面的困境的看法。播放的兩個對手之間的匝數的基本算法/僞代碼是這樣強化學習 - 從餘暉TD學習

WHITE.CHOOSE_ACTION(GAME_STATE); //White player decides on its next move by evaluating the current game state (TD(λ) learning) 

GAME_STATE = WORLD.APPLY(WHITE_PLAYERS_ACTION); //We apply the chosen action of the player to the environment and a new game state emerges 

IF (GAME STATE != FINAL){ // If the new state is not final (not a winning state for white player), do the same for the Black player 

    BLACK.CHOOSE_ACTION(GAME_STATE) 

GAME_STATE = WORLD.APPLY(BLACK_PLAYERS_ACTION) // We apply the chosen action of the black player to the environment and a new game state emerges. 
}

時候應該每個球員援引他的學習方法PLAYER.LEARN（GAME_STATE）。這是dillema。

選項A. 後立即每個球員的舉動，新afterstate出現後，如下：

WHITE.CHOOSE_ACTION(GAME_STATE); 
GAME_STATE = WORLD.APPLY(WHITE_PLAYERS_ACTION); 
WHITE.LEARN(GAME_STATE) // White learns from the afterstate that emerged right after his action 
IF (GAME STATE != FINAL){ 
    BLACK.CHOOSE_ACTION(GAME_STATE) 
    GAME_STATE = WORLD.APPLY(BLACK_PLAYERS_ACTION) 
    BLACK.LEARN(GAME_STATE) // Black learns from the afterstate that emerged right after his action

選項B. 後立即每個球員的舉動，新afterstate出現後，又經過了如果對手取得勝利，對手會移動。

WHITE.CHOOSE_ACTION(GAME_STATE); 
GAME_STATE = WORLD.APPLY(WHITE_PLAYERS_ACTION); 
WHITE.LEARN(GAME_STATE) 
IF (GAME_STATE == FINAL) //If white player won 
    BLACK.LEARN(GAME_STATE) // Make the Black player learn from the White player's winning afterstate 
IF (GAME STATE != FINAL){ //If white player's move did not produce a winning/final afterstate 
    BLACK.CHOOSE_ACTION(GAME_STATE) 
    GAME_STATE = WORLD.APPLY(BLACK_PLAYERS_ACTION) 
    BLACK.LEARN(GAME_STATE) 
    IF (GAME_STATE == FINAL) //If Black player won 
     WHITE.LEARN(GAME_STATE) //Make the White player learn from the Black player's winning afterstate

我相信選項B更合理。

來源

2015-07-05 Pokopik

通常，在與TD學習，代理將具有3個功能：

開始（觀察）→動作
步驟（觀察，獎勵）→動作
光潔度（獎勵）

動作與學習相結合，當遊戲結束時也會發生更多的學習。

來源

2015-07-05 06:11:35

強化學習 - 從餘暉TD學習

回答

相關問題