在alpha-beta搜索中出現意外的路徑依賴？

我正在爲舊遊戲的北歐tafl家族編寫人工智能（project here,source file at issue here）。他們已經足夠接近國際象棋的知識，在這裏申請國際象棋。（所討論的變體是在7x7棋盤上以徑向對稱的起始位置進行播放，白色從中間開始，黑色從邊緣開始。）我遇到了一個奇怪的問題，那就是我如何實施alpha-beta搜索：搜索到固定深度的結果，除了alpha-beta修剪之外沒有啓用優化，根據探索節點的順序而變化。在alpha-beta搜索中出現意外的路徑依賴？

在討論的文件中，重要的方法是'explore'，'exploreChildren'，'handleEvaluationResults'和'generateSuccessorMoves'。 'explore'檢查是否存在換位表命中（在此測試的其他位置禁用），評估狀態（如果它是勝利或葉節點）還是調用exploreChildren。 exploreChildren在子節點上進行遞歸搜索。 generateSuccessorMoves生成（並可選地排序）退出當前狀態的移動。 handleEvaluationResults確定兒童評估是否導致截斷。因此，我寫了一個最小的測試用例：generateSuccessorMoves首先不排序，然後簡單地洗牌移動列表而不是排序。搜索的結果是沒有結果等同，也不在結果考慮對稱，也不值：

MAIN SEARCH 
# cutoffs/avg. to 1st a/b a/b 
Depth 0: 0/0 0/0 
Depth 1: 0/22 0/1 
Depth 2: 42/0 3/0 
Finding best state... 
Best move: d3-g3 with path... 
    d3-g3 
    e1-f1 
    e4-e1xf1 
End of best path scored -477 
Observed/effective branching factor: 23.00/9.63 
Thought for: 72msec. Tree sizes: main search 893 nodes, continuation search: 0 nodes, horizon search: 0 nodes 
Overall speed: 12402.77777777778 nodes/sec 
Transposition table stats: Table hits/queries: 0/0 Table inserts/attempts: 0/0 
1. move: d3-g3 value: -477 
Using 5000msec, desired 9223372036854775807 
Depth 3 explored 1093 states in 0.037 sec at 29540.54/sec 
MAIN SEARCH 
# cutoffs/avg. to 1st a/b a/b 
Depth 0: 0/0 0/0 
Depth 1: 0/21 0/2 
Depth 2: 104/0 2/0 
Finding best state... 
Best move: d3-f3 with path... 
    d3-f3 
    d2-c2 
    d5-f5xf4 
End of best path scored -521 
Observed/effective branching factor: 23.00/10.30 
Thought for: 37msec. Tree sizes: main search 1093 nodes, continuation search: 0 nodes, horizon search: 0 nodes 
Overall speed: 29540.540540540544 nodes/sec 
Transposition table stats: Table hits/queries: 0/0 Table inserts/attempts: 0/0 
7. move: d3-f3 value: -521

這是一個極端的例子，很明顯，但它是我的理解是α-β在這種情況下（即，除了'alpha-beta修剪'以外沒有任何特性）應該是穩定的，不管搜索順序如何 - 至少應該返回一個相同值的節點。我錯了嗎？難道我做錯了什麼？

首先編輯：雖然我認爲從這個問題的描述中可以明顯看出，但在我的alpha-beta實現中有一些尚未知的bug。進一步的測試表明，它不提供與純極小極小相同的結果。

第二次編輯：這是在上面鏈接的文件中實現的alpha-beta搜索的僞代碼版本。

explore(depth, maxDepth, alpha, beta) 
    // some tafl variants feature rules where one player moves more than once in a turn 
    // each game tree node knows whether it's maximizing or minimizing 
    var isMaximizing = this.isMaximizing() 
    var value = NO_VALUE 

    if(isTerminal(depth, maxDepth)) 
     value = eval() 
    else 
     for(move in successorMoves) 
      if(cutoff) break 

      nodeValue = nextNode(move).explore(depth + 1, maxDepth, alpha, beta) 
      if(value == NO_VALUE) value = nodeValue 

      if(isMaximizing) 
       value = max(value, nodeValue) 
       alpha = max(alpha, value) 
       if(beta <= alpha) break 
      else 
       value = min(value, nodeValue) 
       beta = min(beta, value) 
       if(beta <= alpha) break 


rootNode.explore(0, 5, -infinity, infinity)

來源

2016-08-20 Fishbreath

我掃描了鏈接的代碼。我沒有看到深度（currentMaxDepth，mCurrentMaxDepth，overallMaxDepth）在遞歸（探索）中增加的任何地方。你可以解釋嗎？ –

迭代加深發生在AiWorkspace.explore（）中。在狀態進行遞歸調用時，GameTreeState.exploreChildren（）會在給定的深化步驟中搜索的深度增加。 – Fishbreath

原來這是我的錯。我有一些遞歸地重新評估某個節點上方的節點的代碼，用於擴展搜索，並且我在錯誤的地方調用了它（在探索了任何節點的所有子節點之後）。早期的後向傳播導致了不正確的α和β值，因此導致了早期截止。

來源

2016-08-25 13:39:49 Fishbreath

在alpha-beta搜索中出現意外的路徑依賴？

回答

相關問題