我想使用極小極大搜索(使用alpha-beta修剪),或者更確切地說是negamax搜索,使計算機程序玩紙牌遊戲。使用極大極小搜索對不完美信息的紙牌遊戲
紙牌遊戲實際上由4名玩家組成。所以爲了能夠使用minimax等,我把遊戲簡化爲「我」與「其他」。在每次「移動」之後,您可以客觀地從遊戲本身讀取當前狀態的評估。當所有4名玩家都放置了這張牌時,最高的牌會贏得他們 - 並且這些牌的數值會被計入。因爲你不知道其他3名玩家之間的卡牌分配是如何發生的,我認爲你必須模擬所有可能的分佈(「世界」),而這些分佈不是你的。你有12張牌,其他3名牌手總共有36張牌。
所以我的方法是這種算法,其中player
是一個介於1和3之間的數字,象徵着程序可能需要找到的三個計算機播放器。而-player
代表對手,即所有其他三名球員在一起。
private Card computerPickCard(GameState state, ArrayList<Card> cards) {
int bestScore = Integer.MIN_VALUE;
Card bestMove = null;
int nCards = cards.size();
for (int i = 0; i < nCards; i++) {
if (state.moveIsLegal(cards.get(i))) { // if you are allowed to place this card
int score;
GameState futureState = state.testMove(cards.get(i)); // a move is the placing of a card (which returns a new game state)
score = negamaxSearch(-state.getPlayersTurn(), futureState, 1, Integer.MIN_VALUE, Integer.MAX_VALUE);
if (score > bestScore) {
bestScore = score;
bestMove = cards.get(i);
}
}
}
// now bestMove is the card to place
}
private int negamaxSearch(int player, GameState state, int depthLeft, int alpha, int beta) {
ArrayList<Card> cards;
if (player >= 1 && player <= 3) {
cards = state.getCards(player);
}
else {
if (player == -1) {
cards = state.getCards(0);
cards.addAll(state.getCards(2));
cards.addAll(state.getCards(3));
}
else if (player == -2) {
cards = state.getCards(0);
cards.addAll(state.getCards(1));
cards.addAll(state.getCards(3));
}
else {
cards = state.getCards(0);
cards.addAll(state.getCards(1));
cards.addAll(state.getCards(2));
}
}
if (depthLeft <= 0 || state.isEnd()) { // end of recursion as the game is finished or max depth is reached
if (player >= 1 && player <= 3) {
return state.getCurrentPoints(player); // player's points as a positive value (for self)
}
else {
return -state.getCurrentPoints(-player); // player's points as a negative value (for others)
}
}
else {
int score;
int nCards = cards.size();
if (player > 0) { // make one move (it's player's turn)
for (int i = 0; i < nCards; i++) {
GameState futureState = state.testMove(cards.get(i));
if (futureState != null) { // wenn Zug gültig ist
score = negamaxSuche(-player, futureState, depthLeft-1, -beta, -alpha);
if (score >= beta) {
return score;
}
if (score > alpha) {
alpha = score; // alpha acts like max
}
}
}
return alpha;
}
else { // make three moves (it's the others' turn)
for (int i = 0; i < nCards; i++) {
GameState futureState = state.testMove(cards.get(i));
if (futureState != null) { // if move is valid
for (int k = 0; k < nCards; k++) {
if (k != i) {
GameState futureStateLevel2 = futureState.testMove(cards.get(k));
if (futureStateLevel2 != null) { // if move is valid
for (int m = 0; m < nCards; m++) {
if (m != i && m != k) {
GameState futureStateLevel3 = futureStateLevel2.testMove(cards.get(m));
if (futureStateLevel3 != null) { // if move is valid
score = negamaxSuche(-player, futureStateLevel3, depthLeft-1, -beta, -alpha);
if (score >= beta) {
return score;
}
if (score > alpha) {
alpha = score; // alpha acts like max
}
}
}
}
}
}
}
}
}
return alpha;
}
}
}
這似乎做工精細,但對於1(depthLeft=1
)的深度,程序已經需要計算50,000平均移動(放置卡)。當然這太過分了!
所以我的問題是:
- 是實施正確的呢?你能模擬這樣的遊戲嗎?關於不完美的信息,特別是?
- 如何改進算法的速度和工作量?
- 例如,我可以將可能移動的集合減少到50%的隨機集合以提高速度,同時保持良好的結果嗎?
- 我發現UCT algorithm是一個很好的解決方案(也許)。你知道這個算法嗎?你能幫我實施嗎?
嗯,關於minimaxing接近遊戲結束。那時你知道你需要x個技巧才能獲勝。任何你不能(不應該)贏你的世界都可以忽視。因爲如果這個世界是對的,那麼你已經失去了。如果你將你的概率建立在導致獲勝的世界上(基本上使用一廂情願的想法),那麼你甚至可以更多地修剪搜索 – Cruncher