我正試圖用LDA減少數據集。我期望減少數據集的準確性會降低。然而,取決於隨機種子,我有時會減少版本給我更高的準確性。對於簡化數據集,LDA準確性比原始數據更高
X, y = make_classification(1000, 50, n_informative=10, n_classes=20)
X1, X2, y1, y2 = train_test_split(X, y)
lda = LDA()
lda.fit(X1, y1)
predicted = lda.predict(X2)
full_accuracy = accuracy_score(y2, predicted)
reduction = LDA(n_components=5)
X1red = reduction.fit_transform(X1, y1)
X2red = reduction.transform(X2)
lda.fit(X1red, y1)
predicted = lda.predict(X2red)
reduced_accuracy = accuracy_score(predicted, y2)
print full_accuracy, reduced_accuracy, reduced_accuracy/full_accuracy
# prints 0.132 0.16 1.21212121212
你知道爲什麼降維後我有更高的準確性嗎?