我試圖運行使用樣本權重的陣列的簡單Sklearn嶺迴歸。 X_train是由100 2D numpy的陣列〜200K。我嘗試使用sample_weight選項時出現內存錯誤。沒有這個選項,它工作得很好。爲了簡單起見,我將特徵減少到2,並且sklearn仍然會引發內存錯誤。 任何想法?sklearn嶺和sample_weight給出內存錯誤
model=linear_model.Ridge()
model.fit(X_train, y_train,sample_weight=w_tr)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/g/anaconda/lib/python2.7/site-packages/sklearn/linear_model/ridge.py", line 449, in fit
return super(Ridge, self).fit(X, y, sample_weight=sample_weight)
File "/home/g/anaconda/lib/python2.7/site-packages/sklearn/linear_model/ridge.py", line 338, in fit
solver=self.solver)
File "/home/g/anaconda/lib/python2.7/site-packages/sklearn/linear_model/ridge.py", line 286, in ridge_regression
K = safe_sparse_dot(X, X.T, dense_output=True)
File "/home/g/anaconda/lib/python2.7/site-packages/sklearn/utils/extmath.py", line 83, in safe_sparse_dot
return np.dot(a, b)
MemoryError
感謝@ogrisel爲我指出sklearn線性模型以數據爲中心這一事實 – eickenberg
[此增強建議](https://github.com/scikit-learn/scikit-learn/pull/3034)實現瞭解釋的功能以上。 – eickenberg
scikit學習的最新版本現在支持特徵空間中的樣本權重。 – eickenberg