0
我寫了下面的代碼,從文件中導入數據向量並測試SVM分類器的性能(使用sklearn和python)。SKLearn多類分類器
然而,分類器的性能比任何其他分類器都低(例如NNet對測試數據的準確度爲98%,但最多爲92%)。根據我的經驗,SVM應該爲這類數據產生更好的結果。
我可能做錯了什麼?
import numpy as np
def buildData(featureCols, testRatio):
f = open("car-eval-data-1.csv")
data = np.loadtxt(fname = f, delimiter = ',')
X = data[:, :featureCols] # select columns 0:featureCols-1
y = data[:, featureCols] # select column featureCols
n_points = y.size
print "Imported " + str(n_points) + " lines."
### split into train/test sets
split = int((1-testRatio) * n_points)
X_train = X[0:split,:]
X_test = X[split:,:]
y_train = y[0:split]
y_test = y[split:]
return X_train, y_train, X_test, y_test
def buildClassifier(features_train, labels_train):
from sklearn import svm
#clf = svm.SVC(kernel='linear',C=1.0, gamma=0.1)
#clf = svm.SVC(kernel='poly', degree=3,C=1.0, gamma=0.1)
clf = svm.SVC(kernel='rbf',C=1.0, gamma=0.1)
clf.fit(features_train, labels_train)
return clf
def checkAccuracy(clf, features, labels):
from sklearn.metrics import accuracy_score
pred = clf.predict(features)
accuracy = accuracy_score(pred, labels)
return accuracy
features_train, labels_train, features_test, labels_test = buildData(6, 0.3)
clf = buildClassifier(features_train, labels_train)
trainAccuracy = checkAccuracy(clf, features_train, labels_train)
testAccuracy = checkAccuracy(clf, features_test, labels_test)
print "Training Items: " + str(labels_train.size) + ", Test Items: " + str(labels_test.size)
print "Training Accuracy: " + str(trainAccuracy)
print "Test Accuracy: " + str(testAccuracy)
i = 0
while i < labels_test.size:
pred = clf.predict(features_test[i])
print "F(" + str(i) + ") : " + str(features_test[i]) + " label= " + str(labels_test[i]) + " pred= " + str(pred);
i = i + 1
如果默認情況下沒有做多類分類,怎麼可能做多類分類?
p.s.我的數據是下面的格式(最後一欄是類):
2,2,2,2,2,1,0
2,2,2,2,1,2,0
0,2,2,5,2,2,3
2,2,2,4,2,2,1
2,2,2,4,2,0,0
2,2,2,4,2,1,1
2,2,2,4,1,2,1
0,2,2,5,2,2,3
我相信sklearn默認爲svm創建多分類分類器的一對多分類器集合。您也可以嘗試使用[GridSearchCV](http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html)來優化svm超參數。 –
絕對使用GridSearchCV來調整C和伽馬,也可以使用MinMaxScaler或StandardScaler來縮放數據 –
謝謝,明天我會測試它。 – wmac