這樣的數組就像輸入一樣,我從.csv
文件中讀取數據,但是在這裏我從列表中構建數據框,以便可以複製問題。目的是通過使用LogisticRegressionCV
來交叉驗證來訓練邏輯迴歸模型。Sklearn LogisticRegressionCV
indeps = ['M', 'F', 'M', 'F', 'M', 'M', 'F', 'M', 'M', 'F', 'F', 'F', 'F', 'F', 'M', 'F', 'F', 'F', 'F', 'F', 'M', 'F', 'F', 'M', 'M', 'F', 'F', 'F', 'M', 'F', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F', 'M', 'M', 'M', 'F', 'M', 'M', 'M', 'F', 'M', 'M', 'F', 'F']
dep = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
data = [indeps, dep]
cols = ['state', 'cat_bins']
data_dict = dict((x[0], x[1]) for x in zip(cols, data))
df = pd.DataFrame.from_dict(data_dict)
df.tail()
cat_bins state
45 0.0 F
46 0.0 M
47 0.0 M
48 0.0 F
49 0.0 F
'''Use Pandas' to encode independent variables. Notice that
we are returning a sparse dataframe '''
def heat_it2(dataframe, lst_of_columns):
dataframe_hot = pd.get_dummies(dataframe,
prefix = lst_of_columns,
columns = lst_of_columns, sparse=True,)
return dataframe_hot
train_set_hot = heat_it2(df, ['state'])
train_set_hot.head(2)
cat_bins state_F state_M
0 1.0 0 1
1 1.0 1 0
'''Use the dataframe to set up the prospective inputs to the model as numpy arrays'''
indeps_hot = ['state_F', 'state_M']
X = train_set_hot[indeps_hot].values
y = train_set_hot['cat_bins'].values
print 'X-type:', X.shape, type(X)
print 'y-type:', y.shape, type(y)
print 'X has shape, is an array and has length:\n', hasattr(X, 'shape'), hasattr(X, '__array__'), hasattr(X, '__len__')
print 'yhas shape, is an array and has length:\n', hasattr(y, 'shape'), hasattr(y, '__array__'), hasattr(y, '__len__')
print 'X does have attribute fit:\n',hasattr(X, 'fit')
print 'y does have attribute fit:\n',hasattr(y, 'fit')
X-type: (50, 2) <type 'numpy.ndarray'>
y-type: (50,) <type 'numpy.ndarray'>
X has shape, is an array and has length:
True True True
yhas shape, is an array and has length:
True True True
X does have attribute fit:
False
y does have attribute fit:
False
所以,輸入到迴歸似乎具有用於.fit
方法必要的屬性。他們是numpy陣列,形狀正確。 X
是與尺寸[n_samples, n_features]
陣列,並且y
是具有形狀[n_samples,]
這裏,向量的文檔:
擬合(X,Y,sample_weight =無)[源]
Fit the model according to the given training data. Parameters: X : {array-like, sparse matrix}, shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape (n_samples,) Target vector relative to X.
....
現在我們試圖以適應迴歸:
logmodel = LogisticRegressionCV(Cs =1, dual=False , scoring = accuracy_score, penalty = 'l2')
logmodel.fit(X, y)
...
TypeError: Expected sequence or array-like, got estimator LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
錯誤消息的來源似乎是在scikits的validation.py模塊中,here。
是引發此錯誤信息的代碼的唯一部分是下面的函數 - 摘錄:
def _num_samples(x):
"""Return number of samples in array-like x."""
if hasattr(x, 'fit'):
# Don't get num_samples from an ensembles length!
raise TypeError('Expected sequence or array-like, got '
'estimator %s' % x)
etc.
問:因爲與我們擬合模型(X
和y
)參數不有屬性「適應」,這是爲什麼錯誤信息引發
冠層1.7.4.3348(64位)使用Python 2.7 scikit學習18.01-3和熊貓0.19.2-2
謝謝你的幫助:)
謝謝你,無論你的建議避免錯誤。你能不能告訴我錯誤信息來源的哪部分源代碼。 – user2738815
錯誤的來源與您在問題中指出的相同。但是它爲什麼會來,因爲評分函數提供了不正確的參數。從那裏提供了不正確的參數,我已經在第一個代碼片段的答案中顯示。 –
我很欣賞你花時間。謝謝.. – user2738815