最近的質心分類器真的效率低下嗎？

我正在閱讀Ethem Alpaydin的「機器學習入門」，我遇到了最近的質心分類器並試圖實現它。我想我已經正確實施了分類器，但我的準確率只有68％。那麼，最近的質心分類器本身效率低下，還是在我的實現中出現了一些錯誤（如下所示）？最近的質心分類器真的效率低下嗎？

該數據集包含含有4個功能和有2個輸出類1372個數據點我的MATLAB實現：

DATA = load("-ascii", "data.txt"); 

#DATA is 1372x5 matrix with 762 data points of class 0 and 610 data points of class 1 
#there are 4 features of each data point 
X = DATA(:,1:4); #matrix to store all features 

X0 = DATA(1:762,1:4); #matrix to store the features of class 0 
X1 = DATA(763:1372,1:4); #matrix to store the features of class 1 
X0 = X0(1:610,:); #to make sure both datasets have same size for prior probability to be equal 
Y = DATA(:,5); # to store outputs 

mean0 = sum(X0)/610; #mean of features of class 0 
mean1 = sum(X1)/610; #mean of featurs of class 1 

count = 0; 
for i = 1:1372 
    pre = 0; 
    cost1 = X(i,:)*(mean0'); #calculates the dot product of dataset with mean of features of both classes 
    cost2 = X(i,:)*(mean1'); 

    if (cost1<cost2) 
    pre = 1; 
    end 
    if pre == Y(i) 
    count = count+1; #counts the number of correctly predicted values 
    end 

end 

disp("accuracy"); #calculates the accuracy 
disp((count/1372)*100);

來源

2017-04-23 user7909152

至少有幾件事情在這裏：

你正在使用點積在輸入空間中分配相似度，這幾乎是從來沒有有效。使用點積的唯一原因是所有數據點都具有相同的規範，或規範無關緊要（幾乎從不是真的）。嘗試使用歐幾里德距離代替，因爲即使它非常天真 - 它應該是更好的
這是一個效率低下分類器？取決於效率的定義。這是一個非常簡單和快速的，但在預測能力方面，它是非常差。事實上，它比樸素貝葉斯更糟糕，它已被認爲是「玩具模型」。
也有一些是錯誤的代碼太
```
X0 = DATA(1:762,1:4); #matrix to store the features of class 0 
X1 = DATA(763:1372,1:4); #matrix to store the features of class 1 
X0 = X0(1:610,:); #to make sure both datasets have same size for prior probability to be equal 
```
一旦你個子樣本X0，你有，但後來在「測試」你考的培訓和「失蹤X0的元素」 1220個訓練樣本，這從概率的角度來看並沒有什麼意義。首先，你不應該測試訓練集的準確性（因爲它高估了真實的準確性），其次，通過對你的訓練數據進行二次抽樣，你可以得到均等的先驗分組。不是像這樣的方法，你只是降低質心估計的質量，沒有別的。這些技術（子/過採樣）均衡了模型的先驗，模型的先驗。你的方法沒有（因爲它基本上是假設1/2之前的生成模型），所以沒有什麼可以發生。

來源

2017-04-23 12:57:13 lejlot

最近的質心分類器真的效率低下嗎？

回答

相關問題