2014-02-13 83 views
1

我有下面的代碼是在我的Python代碼的瓶頸:如何優化此NumPy代碼?

def get_payoff(self, actual, predicted): 
    if abs(actual - 1.0) < 1e-5: # if actual == 1 
     if predicted < 0.5: 
      return self.fn_payoff * (0.5 - predicted) 
     elif predicted > 0.5: 
      return self.tp_payoff * (predicted - 0.5) 
     else: 
      return 0 
    else: 
     if predicted < 0.5: 
      return self.tn_payoff * (0.5 - predicted) 
     elif predicted > 0.5: 
      return self.fp_payoff * (predicted - 0.5) 
     else: 
      return 0 

def get_total_payoff(self): 
    total_payoff = 0 
    for target_element, prediction_element in zip(np.nditer(self.target), np.nditer(predictions)): 
     total_payoff += self.get_payoff(target_element, prediction_element) 

fn_payoff,tp_payoff,tn_payoff和fp_payoff都是浮動。 self.target和self.predictions都是numpy ndarrays。

我認爲有一些方法可以用某種numpy向量化替換get_total_payoff中的for循環,但我不知道如何處理if/then語句來正確地進行向量化。

+1

'def float get_payoff()' - 呃,這是一個錯字還是你使用了一個隱含的靜態類型的Python變體? – delnan

+0

糟糕,我正在將Cythonized版本轉換爲Python的問題,我忘記刪除它。我會修復它 –

+0

'預測'應該是一個全局變量嗎? – user2357112

回答

1
def _get_payoff(self, actual, predicted): 
    pred_factor = numpy.abs(0.5 - predicted) 
    payoff_selector = 2*numpy.isclose(actual, 1) + (predicted < 0.5) 
    payoff = numpy.choose(payoff_selector, 
          [ 
           self.fp_payoff, 
           self.tn_payoff, 
           self.tp_payoff, 
           self.fn_payoff, 
          ]) 
    return numpy.sum(payoff * pred_factor) 

def get_total_payoff(self): 
    return self._get_payoff(self.target, predictions) 

我們使用numpy.choose產生收益的選擇的陣列和繁殖,隨着0.5和預測值之間的絕對差,則總和的陣列。 numpy.isclose用於測試actual值是否接近於1.我們可以忽略predicted == 0.5的情況,因爲乘以numpy.abs(0.5 - predicted)總能得出0的正確結果。如果self.targetpredictions保證爲1D,則numpy.dot可能會比單獨乘法和求和更好。

+1

如果你乘以不是NaN,你只能忽略'預測的== 0.5'情況。否則,你會得到NaN,而不是0. – shx2

+1

@ shx2:儘管如此,我們正在乘以4個收益因素之一。我懷疑那些是NaN。如果是這樣,目前還不清楚產出是否也應該有NaN。 – user2357112

+1

我只是指出,在這種情況下,你的功能不同於原來的。不清楚這是OP想要的。 – shx2

2

使用基於條件的不同表達式的矢量化函數的關鍵是使用np.choose。此外,在你的情況下,predict-0.50.5-predict可以替換爲abs(predict-0.5),加上predict==0.5(我猜測特殊處理是否有正確處理NaN的情況下)的特殊處理。

import numpy as np 

class A(object): 
    def __init__(self): 
     self.fn_payoff = 222. 
     self.tn_payoff = 444. 
     self.fp_payoff = 777. 
     self.tp_payoff = 888. 
     self.target = np.array([ 0.3, 1., 2. ]) 
     self.predictions = np.array([ 0.4, 0.5, 1.7 ]) 

    def get_payoff(self, actual, predicted): 
     if abs(actual - 1.0) < 1e-5: # if actual == 1 
      if predicted < 0.5: 
       return self.fn_payoff * (0.5 - predicted) 
      elif predicted > 0.5: 
       return self.tp_payoff * (predicted - 0.5) 
      else: 
       return 0 
     else: 
      if predicted < 0.5: 
       return self.tn_payoff * (0.5 - predicted) 
      elif predicted > 0.5: 
       return self.fp_payoff * (predicted - 0.5) 
      else: 
       return 0 

    def get_total_payoff(self): 
     total_payoff = 0 
     for target_element, prediction_element in zip(np.nditer(self.target), np.nditer(self.predictions)): 
      total_payoff += self.get_payoff(target_element, prediction_element) 
     return total_payoff 

    def get_total_payoff_VECTORIZED(self): 
     actual_mask = np.abs(self.target - 1) < 1e-5 
     predict_mask = self.predictions < 0.5 
     payoff_n = np.choose(actual_mask, [ self.tn_payoff, self.fn_payoff ]) 
     payoff_p = np.choose(actual_mask, [ self.fp_payoff, self.tp_payoff ]) 
     payoff = np.choose(predict_mask, [ payoff_p, payoff_n ]) * abs(self.predictions-0.5) 
     payoff[self.predictions==0.5] = 0 
     return payoff.sum() 

a = A() 
print a.get_total_payoff() 
=> 976.8 
print a.get_total_payoff_VECTORIZED() 
=> 976.8