0
假設我有以下數據幀Q_df
:熊貓idxmax()未按預期
(0, 0) (0, 1) (0, 2) (1, 0) (1, 1) (1, 2) (2, 0) (2, 1) (2, 2)
(0, 0) 0.000 0.00 0.0 0.64 0.000 0.0 0.512 0.000 0.0
(0, 1) 0.000 0.00 0.8 0.00 0.512 0.0 0.000 0.512 0.0
(0, 2) 0.000 0.64 0.0 0.00 0.000 0.8 0.000 0.000 1.0
(1, 0) 0.512 0.00 0.0 0.00 0.000 0.8 0.512 0.000 0.0
(1, 1) 0.000 0.64 0.0 0.00 0.000 0.0 0.000 0.512 0.0
(1, 2) 0.000 0.00 0.8 0.64 0.000 0.0 0.000 0.000 1.0
(2, 0) 0.512 0.00 0.0 0.64 0.000 0.0 0.000 0.512 0.0
(2, 1) 0.000 0.64 0.0 0.00 0.512 0.0 0.512 0.000 0.0
(2, 2) 0.000 0.00 0.8 0.00 0.000 0.8 0.000 0.000 0.0
這是使用以下代碼生成:
import numpy as np
import pandas as pd
states = list(itertools.product(range(3), repeat=2))
Q = np.array([[0.000,0.000,0.000,0.640,0.000,0.000,0.512,0.000,0.000],
[0.000,0.000,0.800,0.000,0.512,0.000,0.000,0.512,0.000],
[0.000,0.640,0.000,0.000,0.000,0.800,0.000,0.000,1.000],
[0.512,0.000,0.000,0.000,0.000,0.800,0.512,0.000,0.000],
[0.000,0.640,0.000,0.000,0.000,0.000,0.000,0.512,0.000],
[0.000,0.000,0.800,0.640,0.000,0.000,0.000,0.000,1.000],
[0.512,0.000,0.000,0.640,0.000,0.000,0.000,0.512,0.000],
[0.000,0.640,0.000,0.000,0.512,0.000,0.512,0.000,0.000],
[0.000,0.000,0.800,0.000,0.000,0.800,0.000,0.000,0.000]])
Q_df = pd.DataFrame(index=states, columns=states, data=Q)
對於Q的每一行,我想獲取行中最大值對應的列名。如果我嘗試
policy = Q_df.idxmax()
然後將得到的系列看起來是這樣的:
(0, 0) (1, 0)
(0, 1) (0, 2)
(0, 2) (0, 1)
(1, 0) (0, 0)
(1, 1) (0, 1)
(1, 2) (0, 2)
(2, 0) (0, 0)
(2, 1) (0, 1)
(2, 2) (0, 2)
第一行看起來不錯:第一行的最大因素是0.64
和發生在(1,0)
列。第二個也是。然而,對於第三行,最大元素爲0.8
,出現在列(1,2)
中,因此我預計policy
中的對應值爲(1,2)
,而不是(0,1)
。
任何想法這裏怎麼了?