我使用scikit-learn,其中我已經使用unigrams將邏輯迴歸模型保存爲訓練集1的特徵。是否可以加載此模型,然後使用第二個訓練集(訓練集2)中的新數據實例進行擴展?如果是,那怎麼辦?這樣做的原因是因爲我對每個訓練集使用了兩種不同的方法(第一種方法涉及特徵腐敗/正則化,第二種方法涉及自我訓練)。如何使用scikit-learn加載先前保存的模型並使用新的培訓數據擴展模型
我添加了一些簡單的示例代碼清晰:
from sklearn.linear_model import LogisticRegression as log
from sklearn.feature_extraction.text import CountVectorizer as cv
import pickle
trainText1 # Training set 1 text instances
trainLabel1 # Training set 1 labels
trainText2 # Training set 2 text instances
trainLabel2 # Training set 2 labels
clf = log()
# Count vectorizer used by the logistic regression classifier
vec = cv()
# Fit count vectorizer with training text data from training set 1
vec.fit(trainText1)
# Transforms text into vectors for training set1
train1Text1 = vec.transform(trainText1)
# Fitting training set1 to the linear logistic regression classifier
clf.fit(trainText1,trainLabel1)
# Saving logistic regression model from training set 1
modelFileSave = open('modelFromTrainingSet1', 'wb')
pickle.dump(clf, modelFileSave)
modelFileSave.close()
# Loading logistic regression model from training set 1
modelFileLoad = open('modelFromTrainingSet1', 'rb')
clf = pickle.load(modelFileLoad)
# I'm unsure how to continue from here....