使用極大極小搜索對不完美信息的紙牌遊戲

我想使用極小極大搜索（使用alpha-beta修剪），或者更確切地說是negamax搜索，使計算機程序玩紙牌遊戲。使用極大極小搜索對不完美信息的紙牌遊戲

紙牌遊戲實際上由4名玩家組成。所以爲了能夠使用minimax等，我把遊戲簡化爲「我」與「其他」。在每次「移動」之後，您可以客觀地從遊戲本身讀取當前狀態的評估。當所有4名玩家都放置了這張牌時，最高的牌會贏得他們 - 並且這些牌的數值會被計入。因爲你不知道其他3名玩家之間的卡牌分配是如何發生的，我認爲你必須模擬所有可能的分佈（「世界」），而這些分佈不是你的。你有12張牌，其他3名牌手總共有36張牌。

所以我的方法是這種算法，其中player是一個介於1和3之間的數字，象徵着程序可能需要找到的三個計算機播放器。而-player代表對手，即所有其他三名球員在一起。

private Card computerPickCard(GameState state, ArrayList<Card> cards) { 
    int bestScore = Integer.MIN_VALUE; 
    Card bestMove = null; 
    int nCards = cards.size(); 
    for (int i = 0; i < nCards; i++) { 
     if (state.moveIsLegal(cards.get(i))) { // if you are allowed to place this card 
      int score; 
      GameState futureState = state.testMove(cards.get(i)); // a move is the placing of a card (which returns a new game state) 
      score = negamaxSearch(-state.getPlayersTurn(), futureState, 1, Integer.MIN_VALUE, Integer.MAX_VALUE); 
      if (score > bestScore) { 
       bestScore = score; 
       bestMove = cards.get(i); 
      } 
     } 
    } 
    // now bestMove is the card to place 
} 

private int negamaxSearch(int player, GameState state, int depthLeft, int alpha, int beta) { 
    ArrayList<Card> cards; 
    if (player >= 1 && player <= 3) { 
     cards = state.getCards(player); 
    } 
    else { 
     if (player == -1) { 
      cards = state.getCards(0); 
      cards.addAll(state.getCards(2)); 
      cards.addAll(state.getCards(3)); 
     } 
     else if (player == -2) { 
      cards = state.getCards(0); 
      cards.addAll(state.getCards(1)); 
      cards.addAll(state.getCards(3)); 
     } 
     else { 
      cards = state.getCards(0); 
      cards.addAll(state.getCards(1)); 
      cards.addAll(state.getCards(2)); 
     } 
    } 
    if (depthLeft <= 0 || state.isEnd()) { // end of recursion as the game is finished or max depth is reached 
     if (player >= 1 && player <= 3) { 
      return state.getCurrentPoints(player); // player's points as a positive value (for self) 
     } 
     else { 
      return -state.getCurrentPoints(-player); // player's points as a negative value (for others) 
     } 
    } 
    else { 
     int score; 
     int nCards = cards.size(); 
     if (player > 0) { // make one move (it's player's turn) 
      for (int i = 0; i < nCards; i++) { 
       GameState futureState = state.testMove(cards.get(i)); 
       if (futureState != null) { // wenn Zug gültig ist 
        score = negamaxSuche(-player, futureState, depthLeft-1, -beta, -alpha); 
        if (score >= beta) { 
         return score; 
        } 
        if (score > alpha) { 
         alpha = score; // alpha acts like max 
        } 
       } 
      } 
      return alpha; 
     } 
     else { // make three moves (it's the others' turn) 
      for (int i = 0; i < nCards; i++) { 
       GameState futureState = state.testMove(cards.get(i)); 
       if (futureState != null) { // if move is valid 
        for (int k = 0; k < nCards; k++) { 
         if (k != i) { 
          GameState futureStateLevel2 = futureState.testMove(cards.get(k)); 
          if (futureStateLevel2 != null) { // if move is valid 
           for (int m = 0; m < nCards; m++) { 
            if (m != i && m != k) { 
             GameState futureStateLevel3 = futureStateLevel2.testMove(cards.get(m)); 
             if (futureStateLevel3 != null) { // if move is valid 
              score = negamaxSuche(-player, futureStateLevel3, depthLeft-1, -beta, -alpha); 
              if (score >= beta) { 
               return score; 
              } 
              if (score > alpha) { 
               alpha = score; // alpha acts like max 
              } 
             } 
            } 
           } 
          } 
         } 
        } 
       } 
      } 
      return alpha; 
     } 
    } 
}

這似乎做工精細，但對於1（depthLeft=1）的深度，程序已經需要計算50,000平均移動（放置卡）。當然這太過分了！

所以我的問題是：

是實施正確的呢？你能模擬這樣的遊戲嗎？關於不完美的信息，特別是？
如何改進算法的速度和工作量？
例如，我可以將可能移動的集合減少到50％的隨機集合以提高速度，同時保持良好的結果嗎？
我發現UCT algorithm是一個很好的解決方案（也許）。你知道這個算法嗎？你能幫我實施嗎？

來源

2012-09-30 caw

Minimax搜索就像你實施它一樣，對於那些存在很多不確定性的遊戲來說是錯誤的方法。由於您不知道其他玩家之間的卡片分配情況，因此您的搜索將花費指數的時間來探索在卡片實際分配情況下無法發生的遊戲。

我認爲一個更好的方法是從很少或根本沒有關於其他玩家手牌的信息開始，制定出色的遊戲規則。諸如：

如果您先在一輪中先玩，那麼玩這張牌的機率很小，因爲您幾乎沒有機會贏得這輪比賽。
如果您在最後一輪中進行比賽，請打出贏家的最低牌。如果你不能贏得一輪，那麼打你最低的牌。

讓你的程序最初不打擾搜索，只是玩這些規則，並假設所有其他玩家也會使用這些啓發式。當程序觀察每場比賽的第一名和最後一名選手時，它可以建立一張關於每名選手可能持有的牌的信息表。例如。本場比賽中9人將獲勝，但3人不能上場，所以他不得有9張或更高的牌。當收集每個玩家的手的信息時，搜索空間最終將被限制在可能的遊戲的極小極大搜索可以產生關於下一張要玩的有用信息的點。

來源

2012-10-02 04:15:58

嗯，關於minimaxing接近遊戲結束。那時你知道你需要x個技巧才能獲勝。任何你不能（不應該）贏你的世界都可以忽視。因爲如果這個世界是對的，那麼你已經失去了。如果你將你的概率建立在導致獲勝的世界上（基本上使用一廂情願的想法），那麼你甚至可以更多地修剪搜索 – Cruncher

我想澄清接受的答案沒有真正涉及的細節。

在許多紙牌遊戲中，您可以取樣您的對手可能擁有的未知卡，而不是生成所有這些卡。在進行抽樣時，您可以考慮短時間內的信息，以及持有某些牌的概率，以衡量每隻手的可能性（每隻手都是我們將獨立解決的可能世界）。然後，你使用完美的信息搜索來解決每一隻手。在所有這些世界上最好的舉動往往是總體上最好的舉動 - 有一些警告。

在像撲克這樣的遊戲中，這不會很好 - 遊戲就是隱藏的信息。您必須精確地平衡自己的行爲，以隱藏您手中的信息。

但是，在像技巧型紙牌遊戲這樣的遊戲中，這項工作非常好 - 特別是因爲新信息一直在顯示。無論如何，真正優秀的球員都有一個好主意，每個人都擁有。所以，相當強大的Skat和Bridge計劃就是基於這些想法。

如果你能完全解決潛在的世界，那是最好的，但如果你不能，你可以用minimax或UCT來選擇每個世界的最佳舉動。還有混合算法（ISMCTS）試圖將這個過程混合在一起。請注意這裏的要求。簡單的抽樣方法更容易編碼 - 您應該在更復雜的方法之前嘗試更簡單的方法。

這裏有一些研究論文，這將使當採樣方法不完全信息一直行之有效的一些更多的信息：

Understanding the Success of Perfect Information Monte Carlo Sampling in Game Tree Search（當採樣的方法是可能奏效本文分析。）

Improving State Evaluation, Inference, and Search in Trick-Based Card Games（本文描述了使用在斯卡特採樣的）

Imperfect information in a computationally challenging game（本文描述了在橋採樣）

Information Set Monte Carlo Tree Search（本文合併採樣和UCT /蒙特卡洛樹搜索，以避免在第一參考的問題。）

與接受的答案基於規則的方法的問題在於，他們不能充分利用計算資源的超越這是創建初始規則所需的。此外，基於規則的方法將受限於您可以編寫的規則的力量。基於搜索的方法可以使用組合搜索的力量來產生比程序作者更強的遊戲。

來源

2015-05-04 18:09:19