Python Pandas.where有超過2個可能的條件輸入

我想使用.where熊貓數據框方法，只有我有超過2種可能性（即我有如果，ELIF，否則，而不是默認行爲，如果其他）Python Pandas.where有超過2個可能的條件輸入

請考慮下面的數據框：

a1 = np.random.rand(7,2) 
a2 = np.random.randint(0,3,(7,1)) 
grid = np.append(a1, a2, axis=1) 
df = pd.DataFrame(grid)

我試圖

def test(x): 
    if x[2] == 0: 
     return 5 
    if x[2]==1: 
     return 10 
    if x[2] ==2: 
     return 50 

df.where(test)

但我收到錯誤消息「一個系列的真值不明確」。我懷疑這是正確的方向，但我對如何實現它感到困惑。該文件說，如果條件是可調用的，則輸入被認爲是完整的df。但即使如此，它似乎認爲x[2]作爲整個列2.是否沒有辦法實現該任務的矢量化操作？是否只能逐行迭代，無論是否使用iterrows或應用？

這是一個玩具的例子，在論壇上很清楚，我不是想在我的現實生活中做一個簡單的.map問題。請保留「測試」功能作爲一個獨立的功能，如果您回答需要通過，因爲這是我的困難所在。

來源

2017-04-19 jim basquiat

np.random.seed(100) 
a1 = np.random.rand(7,2) 
a2 = np.random.randint(0,3,(7,1)) 
grid = np.append(a1, a2, axis=1) 
df = pd.DataFrame(grid) 
print (df) 
      0   1 2 
0 0.543405 0.278369 2.0 
1 0.424518 0.844776 2.0 
2 0.004719 0.121569 0.0 
3 0.670749 0.825853 0.0 
4 0.136707 0.575093 1.0 
5 0.891322 0.209202 1.0 
6 0.185328 0.108377 1.0

解map：

d = {0:5,1:10,2:50} 
df['d'] = df[2].map(d) 
print (df) 
      0   1 2 d 
0 0.543405 0.278369 2.0 50 
1 0.424518 0.844776 2.0 50 
2 0.004719 0.121569 0.0 5 
3 0.670749 0.825853 0.0 5 
4 0.136707 0.575093 1.0 10 
5 0.891322 0.209202 1.0 10 
6 0.185328 0.108377 1.0 10

numpy.where與另一種解決方案：

df['d'] = np.where(df[2] == 0, 5, 
      np.where(df[2]== 1, 10, 50)) 

print (df) 
      0   1 2 d 
0 0.543405 0.278369 2.0 50 
1 0.424518 0.844776 2.0 50 
2 0.004719 0.121569 0.0 5 
3 0.670749 0.825853 0.0 5 
4 0.136707 0.575093 1.0 10 
5 0.891322 0.209202 1.0 10 
6 0.185328 0.108377 1.0 10

編輯：

對於單獨的函數是可能的使用apply與參數axis=1用於處理df由rows：

def test(x): 
    #print (x) 
    if x[2] == 0: 
     return 5 
    if x[2]==1: 
     return 10 
    if x[2] ==2: 
     return 50 

df['d'] = df.apply(test, axis=1) 
print (df) 
      0   1 2 d 
0 0.543405 0.278369 2.0 50 
1 0.424518 0.844776 2.0 50 
2 0.004719 0.121569 0.0 5 
3 0.670749 0.825853 0.0 5 
4 0.136707 0.575093 1.0 10 
5 0.891322 0.209202 1.0 10 
6 0.185328 0.108377 1.0 10

但如果需要的功能：

def test(x): 
    return np.where(x == 0, 5, np.where(x== 1, 10, 50)) 

print (test(df[2])) 
[50 50 5 5 10 10 10]

來源

2017-04-19 12:15:18 jezrael

HI，謝謝。你能否將答案保留爲「測試」作爲一個單獨的函數，它是通過地圖或地點傳遞的？這就是我真實生活中的例子。 –

好的thx：所以我明白我必須在這裏使用apply或iterrows - 沒有辦法使用矢量化操作來實現結果，因爲我認爲是可能的？在where方法的doc中，他們提到了使用callable的可能性，這正是我在這裏要做的：http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.where。 html –

是的，我認爲應該這樣做。我會做測試的時間，但我懷疑哪裏比申請更快？在這種情況下，我正在尋找 –

Python Pandas.where有超過2個可能的條件輸入

回答

相關問題