我在做什麼錯在這裏?我有我想要使用執行部分適合大型數據集Scikit學習的SGDClassifierSKlearn SGD部分適合
我下面
from sklearn.linear_model import SGDClassifier
import pandas as pd
chunksize = 5
clf2 = SGDClassifier(loss='log', penalty="l2")
for train_df in pd.read_csv("train.csv", chunksize=chunksize, iterator=True):
X = train_df[features_columns]
Y = train_df["clicked"]
clf2.partial_fit(X, Y)
,我發現了錯誤
Traceback (most recent call last): File "/predict.py", line 48, in sys.exit(0 if main() else 1) File "/predict.py", line 44, in main predict() File "/predict.py", line 38, in predict clf2.partial_fit(X, Y) File "/Users/anaconda/lib/python3.5/site-packages/sklearn/linear_model/stochastic_gradient.py", line 512, in partial_fit coef_init=None, intercept_init=None) File "/Users/anaconda/lib/python3.5/site-packages/sklearn/linear_model/stochastic_gradient.py", line 349, in _partial_fit _check_partial_fit_first_call(self, classes) File "/Users/anaconda/lib/python3.5/site-packages/sklearn/utils/multiclass.py", line 297, in _check_partial_fit_first_call raise ValueError("classes must be passed on the first call " ValueError: classes must be passed on the first call to partial_fit.
」所有調用partial_fit的類都可以通過np.unique(y_all)獲得,其中y_all是整個數據集的目標向量,該參數對第一次調用partial_fit是必需的,可以在後續調用請注意,y不需要包含類中的所有標籤。「 http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier.partial_fit – 2017-02-09 21:37:32
@JackManey請發表您的評論作爲答案,以便提問者可以接受和/或者關閉該問題。 –