1
我在創建的合成數據集上使用sklearn.linear_model.Perceptron。數據由2個類組成,每個類是具有共同非對角協方差矩陣的多變量高斯分佈。這些類的質心非常接近,以至於存在顯着的重疊。爲什麼sklearn的感知器以1的精度,精度等來預測?
mean1 = np.ones((20,))
mean2 = 2 * np.ones((20,))
A = 0.1 * np.random.randn(20,20)
cov = np.dot(A, A.T)
class1 = np.random.multivariate_normal(mean1, cov, 2000)
class2 = np.random.multivariate_normal(mean2, cov, 2000)
class1 = np.concatenate((class1, np.ones((len(class1), 1))), axis=1)
class2 = np.concatenate((class2, 2*np.ones((len(class2), 1))), axis=1)
class1_train, class1_test = train_test_split(class1, test_size=0.3)
class2_train, class2_test = train_test_split(class2, test_size=0.3)
train = np.concatenate((class1_train, class2_train), axis=0)
test = np.concatenate((class1_test, class2_test), axis=0)
np.random.shuffle(train)
np.random.shuffle(test)
y_train = train[:,20]
x_train = train[:,0:20]
y_test = test[:,20]
x_test = test[:,0:20]
保存這些數據後,我只是用:
classifier = sklearn.linear_model.Perceptron()
classifier.fit(x_train, y_train)
predicted_test = classifier.predict(x_test)
accuracy = sklearn.metrics.accuracy_score(y_test, predicted_test)
precision = sklearn.metrics.precision_score(y_test, predicted_test)
recall = sklearn.metrics.recall_score(y_test, predicted_test)
f_measure = sklearn.metrics.f1_score(y_test, predicted_test)
print(accuracy, precision, recall, f_measure)
的數據是由設計重疊。但是,線性分類器能夠以某種精度,精度等完全預測,全部爲1.
請轉成[MCVE]這一點。有大量未定義的變量和函數。 – cel
謝謝。我將按照鏈接中的說明重寫問題。 –