我目前正在爲棋盤遊戲Hex寫一個AI。我想用蒙特卡洛樹搜索來做到這一點,並且已經試圖實現它。然而,人工智能做出了令人難以置信的愚蠢(隨機)移動,我無法弄清楚爲什麼它不起作用。蒙特卡洛樹搜索不工作
import java.util.ArrayList;
import java.util.Random;
/**
* Created by Robin on 18.03.2017.
*/
public class TreeNode {
private static final Random random = new Random();
private static final double epsion=10e-5;
protected double nvisits;
protected double totValue;
protected int move=-1;
private HexBoard board;
protected ArrayList<TreeNode>children ;
public TreeNode(HexBoard board){
this.board =board;
}
//Copy-Constructor
public TreeNode(TreeNode treeNode){
this.nvisits=treeNode.nvisits;
this.totValue=treeNode.totValue;
this.move=treeNode.move;
this.board = new HexBoard(treeNode.board);
}
public void update(double value){
totValue+=value*board.color;
nvisits++;
}
public void expand(){
assert(children==null);
children = new ArrayList<>(121-board.moveCount);
for(int i=0;i<121;i++){
if(board.board[i]!=HexBoard.EMPTY)
continue;
TreeNode newNode = new TreeNode(board);
newNode.move =i;
children.add(newNode);
}
}
public void calculateIteration(){
ArrayList<TreeNode>visited = new ArrayList<>();
TreeNode current =this;
visited.add(current);
while(!current.isLeafNode()){
current =current.select();
board.makeMove(current.move);
visited.add(current);
}
//Found a leaf node
double value;
if(current.board.getWinner()==0){
current.expand();
TreeNode newNode =current.select();
value =playOut(newNode.board);
}else{
value =current.board.getWinner();
}
//update all the nodes
for(int i=1;i<visited.size();i++){
visited.get(i).update(value);
board.undoMove(visited.get(i).move);
}
visited.get(0).update(value);
}
public static int playOut(HexBoard board){
int winner=0;
if(board.moveCount==121) {
winner=board.getWinner();
return winner;
}
//Checking-Movecount vs actual stones on the board
final double left =121-board.moveCount;
double probibility =1/left;
double summe =0;
double p =random.nextDouble();
int randomMove =0;
for(int i=0;i<121;i++){
if(board.board[i]!=HexBoard.EMPTY)
continue;
summe+=probibility;
if(p<=summe && probibility!=0) {
randomMove = i;
break;
}
}
board.makeMove(randomMove);
winner =playOut(board);
board.undoMove(randomMove);
return winner;
}
public TreeNode select(){
TreeNode bestNode=null;
double bestValue =-10000000;
for(TreeNode node : children){
double uctvalue =(node.nvisits==0)?100000:(node.totValue/(node.nvisits)+Math.sqrt((Math.log(this.nvisits))/(2*node.nvisits)));
uctvalue+=epsion*random.nextDouble();
if(uctvalue>bestValue){
bestValue=uctvalue;
bestNode =node;
}
}
return bestNode;
///
}
public boolean isLeafNode(){
return (children==null);
}
}
我在方法calcualteIteration()中的實現是否正確?
我知道這可能不是看一個非常有吸引力的問題,但我希望得到任何幫助
這太寬泛了。請進行一些調試以縮小這個問題的範圍,使其更簡單一些,以及[最小測試用例](https://stackoverflow.com/help/mcve)。 –
你真的在跟蹤哪個球員做出哪些動作嗎?你在迭代中輪流輪流嗎?對我來說,看起來你只是讓現在的玩家在你的模擬中填滿整個棋盤,它假裝沒有對手。或者我錯過了什麼?此外,告訴我們您正在運行多少模擬以及如何最終決定在「真實」遊戲中玩什麼遊戲會很有用 –
對不起,我應該澄清這一點。 board.makemove()函數在兩個玩家之間交替。我嘗試了100-50000次模擬中的所有事情,結果幾乎相同(壞隨機動作)。根節點的「最佳」兄弟是具有最高uct值的兄弟,並且將由AI – CheckersGuy