2015-09-19 67 views
8

我只是想做一個簡單的RandomForestRegressor示例。但在測試精度時,我得到這個錯誤得到連續不支持RandomForestRegressor錯誤

/Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc in accuracy_score(y_true, y_pred, normalize, sample_weight) 
    177 
    178  # Compute accuracy for each possible representation 
--> 179  y_type, y_true, y_pred = _check_targets(y_true, y_pred) 
    180  if y_type.startswith('multilabel'): 
    181   differing_labels = count_nonzero(y_true - y_pred, axis=1) 

/Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc in _check_targets(y_true, y_pred) 
    90  if (y_type not in ["binary", "multiclass", "multilabel-indicator", 
    91      "multilabel-sequences"]): 
---> 92   raise ValueError("{0} is not supported".format(y_type)) 
    93 
    94  if y_type in ["binary", "multiclass"]: 

ValueError: continuous is not supported 

這是數據的樣本。我無法顯示真實的數據。

target, func_1, func_2, func_2, ... func_200 
float, float, float, float, ... float 

這是我的代碼。

import pandas as pd 
import numpy as np 
from sklearn.preprocessing import Imputer 
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor, ExtraTreesRegressor, GradientBoostingRegressor 
from sklearn.cross_validation import train_test_split 
from sklearn.metrics import accuracy_score 
from sklearn import tree 

train = pd.read_csv('data.txt', sep='\t') 

labels = train.target 
train.drop('target', axis=1, inplace=True) 
cat = ['cat'] 
train_cat = pd.get_dummies(train[cat]) 

train.drop(train[cat], axis=1, inplace=True) 
train = np.hstack((train, train_cat)) 

imp = Imputer(missing_values='NaN', strategy='mean', axis=0) 
imp.fit(train) 
train = imp.transform(train) 

x_train, x_test, y_train, y_test = train_test_split(train, labels.values, test_size = 0.2) 

clf = RandomForestRegressor(n_estimators=10) 

clf.fit(x_train, y_train) 
y_pred = clf.predict(x_test) 
accuracy_score(y_test, y_pred) # This is where I get the error. 

回答

18

這是因爲accuracy_score僅用於分類任務。 對於迴歸,你應該使用不同的東西,例如:

clf.score(X_test, y_test) 

哪裏X_test爲樣本,y_test對應地面真值。它會計算裏面的預測。

+0

有誰知道如何比較預測和分類一樣迴歸測試值? – Priyansh

1

既然你正在做一個分類任務,你應該使用的 指標R平方(確定共同effecient)代替 準確度得分(準確度得分用於分類目的)。

爲了避免混淆,我建議你使用不同的變量名,如reg/rfr。

R平方可以通過調用RandomForestRegressor提供score功能來計算,例如:

rfr.score(X_test,Y_test)