Python Negamax算法

我有一個儘可能簡單的negamax算法，用於評估Tic Tac Toe中的位置。遊戲的狀態以數組的形式存儲在numpy中，其中X的塊表示爲1，O的塊表示爲4。Python Negamax算法

剛纔我在測試這一點，並發現：

a = np.zeros(9).reshape(3,3) 
negaMax(a, 6, 1) # Returned zero as it should 
negaMax(a, 7, 1) # Returns 100

意思是說我的算法認爲他們已經找到了X到七個層中贏得了比賽井字，這顯然是不可能的方式反對體面的遊戲。我無法弄清楚如何讓它打印出它找到的最佳動作，所以在調試時遇到了麻煩。我究竟做錯了什麼？

def winCheck(state): 
     """Takes a position, and returns the outcome of that game""" 
     # Sums which correspond to a line across a column 
     winNums = list(state.sum(axis=0)) 
     # Sums which correspond to a line across a row 
     winNums.extend(list(state.sum(axis=1))) 
     # Sums which correspond to a line across the main diagonal 
     winNums.append(state.trace()) 
     # Sums which correspond to a line across the off diagonal 
     winNums.append(np.flipud(state).trace()) 

     if Square.m in winNums: 
       return 'X' 
     elif (Square.m**2 + Square.m) in winNums: 
       return 'O' 
     elif np.count_nonzero(state) == Square.m**2: 
       return 'D' 
     else: 
       return None 

def moveFind(state): 
     """Takes a position as an nparray and determines the legal moves""" 
     moveChoices = [] 

     # Iterate over state, to determine which squares are empty 
     it = np.nditer(state, flags=['multi_index']) 
     while not it.finished: 
      if it[0] == 0: 
        moveChoices.append(it.multi_index) 
      it.iternext() 
     return moveChoices 

def moveSim(state, move, player): 
     """Create the state of the player having moved without interfering with the board""" 
     simState = state.copy() 
     if player == 1: 
       simState[move] = 1 
     else: 
       simState[move] = gamecfg.n + 1 
     return simState 

def positionScore(state): 
     """The game is either won or lost""" 
     if winCheck(state) == 'X': 
       return 100 
     elif winCheck(state) == 'O': 
       return -100 
     else: 
       return 0 

def negaMax(state, depth, colour): 
     """Recursively find the best move via a negamax search""" 
     if depth == 0: 
       return positionScore(state) * colour 

     highScore = -100 

     moveList = moveFind(state) 
     for move in moveList: 
       score = -negaMax(moveSim(state, move, colour), depth -1, colour * -1) 
       highScore = max(score, highScore) 

     return highScore

來源

2017-07-02 Qiri

對不起@JacobIRR，我種種原因錯過了線在複製過程中間，我編輯了這個問題。 – Qiri

我想知道當有人贏了時你不停止比賽是否重要？如果你不介意另一個人首先得到3個O的線，也許可以總是得到3個X的線。 –

@PeterdeRivaz這不是我想到的，但我已經向我自己證明，如果你在中心開始，並且不擔心首先失敗，那麼很有可能在七層中連續強制三個。 – Qiri

你的代碼並不認爲遊戲在製作3行符號時會停止。

這意味着它玩井字棋的變體，其中X贏，如果他犯了線的3 O作出了線的3

即使對於這種變型，該方案具有正確發現X有可能總是獲勝！

（我碰到相同的情況下帶着我做了一個國際象棋程序在電腦的樂於奉獻的王是否會達到擊破稍後...）

來源

2017-07-02 20:24:11

Python Negamax算法

回答

相關問題