2017-07-31 77 views
0

我試圖適應sklearn載體,但我收到此錯誤:ValueError異常:未知的不支持多類因變量

ValueError: unknown is not supported This is my code:

X = df_features.values 
    X = X.reshape((len(X),len(df_features.columns))) 
    Y = df_train['action'].values 
    Y = Y.reshape((len(Y),)) 

pipeline = Pipeline([ 
('clf', RandomForestClassifier()) 
]) 

parameters = { 
    'clf__max_depth': [5,7,9], 
    'clf__max_features': [3,4,5], 
    'clf__min_samples_leaf': [3,4,5,6,7], 
    'clf__bootstrap': [True] 
} 

score_func = make_scorer(metrics.f1_score,average='weighted') 

grid_search = GridSearchCV(pipeline, parameters, n_jobs=3, 
    verbose=1, scoring=score_func) 

grid_search.fit(X, Y) 

這是Y採樣數據:

['NOTHING', 'NOTHING', 'SELL', 'SELL', 'NOTHING', 'NOTHING', 'NOTHING']

我該如何解決這個問題?
謝謝

+0

你必須使用二值化到Y二值化,以0和1。如果您上傳您的數據,我可以提供一個示例 – sera

+0

@sera。它不是必需的。 scikit-learn估計器自動處理類標籤的轉換。 –

+0

'Y'的類型是什麼。顯示「type(Y)'。在試穿之前試試'Y = Y.astype('str')'。 –

回答

0

請檢查x和y的類型和大小。另外,您是否有足夠的樣本來獲取所需的max_depth和min_samples_leaf?

以下示例似乎正常工作。我使用了虹膜數據,並將一個交叉驗證作爲一個例子。

from sklearn.pipeline import Pipeline 
from sklearn.model_selection import GridSearchCV 
from sklearn.metrics import fbeta_score, make_scorer 
from sklearn.ensemble import RandomForestClassifier 
from sklearn.datasets import load_iris 
import numpy as np 
from sklearn import metrics 
from sklearn.model_selection import LeaveOneOut 


loo= LeaveOneOut() 
data = load_iris() 

x = data.data 
x = x[0:14,:] 
x.shape 

y = ['NOTHING', 'NOTHING', 'SELL', 'SELL', 'NOTHING', 'NOTHING','SELL','SELL','NOTHING','SELL','SELL','NOTHING','NOTHING','NOTHING'] 
y = np.asarray(y) 
y = y.reshape(14,1) 
y = y.astype('str') 


pipeline = Pipeline([ ('clf', RandomForestClassifier())]) 

parameters = {'clf__max_depth': [1,2,3], 'clf__max_features': [1,2,3], 'clf__min_samples_leaf': [1,2,3], 'clf__bootstrap': [True] } 

score_func = make_scorer(metrics.f1_score,average='weighted') 

grid_search = GridSearchCV(pipeline, parameters, n_jobs=1 , verbose=1, scoring=score_func, cv = loo) 

grid_search.fit(x, y) 

結果

Fitting 14 folds for each of 45 candidates, totalling 630 fits 
[Parallel(n_jobs=1)]: Done 630 out of 630 | elapsed: 33.7s finished 

希望這有助於

+0

我的代碼在我的本地機器上工作正常,當我將系統部署到EC2機器時,出現此錯誤 –

+0

您是否在本地和EC2機器中使用完全相同的版本? – sera

相關問題