2013-08-21 84 views
2
一類

當使用像這樣如何預測概率限制

clf = KNeighborsClassifier(n_neighbors=3) 
clf.fit(X,y) 
predictions = clf.predict_proba(X_test) 

東西如何預測僅侷限於一類?出於性能考慮,這是需要的,例如,當我有一千個班時,但只關心一個班級是否有很高的可能性。

回答

1

Sklearn沒有實現它,你將不得不寫一些類型的包裝,例如 - 你可以extendKNeighborsClassifier類和超載predict_proba方法。

所以循環更改爲一個的claculation按照source code

def predict_proba(self, X): 
     """Return probability estimates for the test data X. 

     Parameters 
     ---------- 
     X : array, shape = (n_samples, n_features) 
      A 2-D array representing the test points. 

     Returns 
     ------- 
     p : array of shape = [n_samples, n_classes], or a list of n_outputs 
      of such arrays if n_outputs > 1. 
      The class probabilities of the input samples. Classes are ordered 
      by lexicographic order. 
     """ 
     X = atleast2d_or_csr(X) 

     neigh_dist, neigh_ind = self.kneighbors(X) 

     classes_ = self.classes_ 
     _y = self._y 
     if not self.outputs_2d_: 
      _y = self._y.reshape((-1, 1)) 
      classes_ = [self.classes_] 

     n_samples = X.shape[0] 

     weights = _get_weights(neigh_dist, self.weights) 
     if weights is None: 
      weights = np.ones_like(neigh_ind) 

     all_rows = np.arange(X.shape[0]) 
     probabilities = [] 
     for k, classes_k in enumerate(classes_): 
      pred_labels = _y[:, k][neigh_ind] 
      proba_k = np.zeros((n_samples, classes_k.size)) 

      # a simple ':' index doesn't work right 
      for i, idx in enumerate(pred_labels.T): # loop is O(n_neighbors) 
       proba_k[all_rows, idx] += weights[:, i] 

      # normalize 'votes' into real [0,1] probabilities 
      normalizer = proba_k.sum(axis=1)[:, np.newaxis] 
      normalizer[normalizer == 0.0] = 1.0 
      proba_k /= normalizer 

      probabilities.append(proba_k) 

     if not self.outputs_2d_: 
      probabilities = probabilities[0] 

     return probabilities 

只需修改代碼,你需要特定類。

一種人爲的方法是覆蓋classes_變量,因此它是所考慮類的單例,並在完成後恢復它。