我的問題和疑問是粗體下面。這個乾淨的數據爲什麼會提供奇怪的SVM分類結果?
我已經使用Accord.NET的支持向量機,通過在他們的文檔頁面上的例子如this one成功。但是,使用KernelSupportVectorMachine與OneclassSupportVectorLearning進行訓練時,訓練過程會導致較大的錯誤值和不正確的分類。
下面的模擬示例顯示了我的意思。它生成一個密集的訓練點集,然後訓練一個支持向量機將點歸類爲集羣內的離羣點或離羣點。訓練簇僅有0.6 0.6平方以原點爲中心,訓練點被以0.1的間隔隔開:
static void Main(string[] args)
{
// Model and training parameters
double kernelSigma = 0.1;
double teacherNu = 0.5;
double teacherTolerance = 0.01;
// Generate input point cloud, a 0.6 x 0.6 square centered at 0,0.
double[][] trainingInputs = new double[49][];
int inputIdx = 0;
for (double x = -0.3; x <= 0.31; x += 0.1) {
for (double y = -0.3; y <= 0.31; y += 0.1) {
trainingInputs[inputIdx] = new double[] { x, y };
inputIdx++;
}
}
// Generate inlier and outlier test points.
double[][] outliers =
{
new double[] { 1E6, 1E6 }, // Very far outlier
new double[] { 0, 1E6 }, // Very far outlier
new double[] { 100, -100 }, // Far outlier
new double[] { 0, -100 }, // Far outlier
new double[] { -10, -10 }, // Still far outlier
new double[] { 0, -10 }, // Still far outlier
};
double[][] inliers =
{
new double[] { 0, 0 }, // Middle of cluster
new double[] { .15, .15 }, // Halfway to corner of cluster
new double[] { -0.1, 0 }, // Comfortably inside cluster
new double[] { 0.25, 0 } // Near inside edge of cluster
};
// Construct the kernel, model, and trainer, then train.
Console.WriteLine($"Training model with parameters:");
Console.WriteLine($" kernelSigma = {kernelSigma.ToString("#.##")}");
Console.WriteLine($" teacherNu={teacherNu.ToString("#.##")}");
Console.WriteLine($" teacherTolerance={teacherTolerance}");
Console.WriteLine();
var kernel = new Gaussian(kernelSigma);
var svm = new KernelSupportVectorMachine(kernel, inputs: 1);
var teacher = new OneclassSupportVectorLearning(svm, trainingInputs)
{
Nu = teacherNu,
Tolerance = teacherTolerance
};
double error = teacher.Run();
Console.WriteLine($"Training complete - error is {error.ToString("#.##")}");
Console.WriteLine();
// Test trained classifier.
Console.WriteLine("Testing outliers:");
foreach (double[] outlier in outliers) {
WriteResultDetail(svm, outlier);
}
Console.WriteLine();
Console.WriteLine("Testing inliers:");
foreach (double[] inlier in inliers) {
WriteResultDetail(svm, inlier);
}
}
private static void WriteResultDetail(KernelSupportVectorMachine svm, double[] coordinate)
{
string prettyCoord = $"{{ {string.Join(", ", coordinate)} }}".PadRight(20);
Console.Write($"Classifying: {prettyCoord} Result: ");
// Classify coordinate, print results.
double result = svm.Compute(coordinate);
if (Math.Sign(result) == 1) {
Console.Write("Inlier");
}
else {
Console.Write("Outlier");
}
Console.Write($" ({result.ToString("#.##")})\n");
}
下面是一個合理的參數集合中的輸出:
Training model with parameters:
kernelSigma = .1
teacherNu=.5
teacherTolerance=0.01
Training complete - error is 222.4
Testing outliers:
Classifying: { 1000000, 1000000 } Result: Inlier (2.28)
Classifying: { 0, 1000000 } Result: Inlier (2.28)
Classifying: { 100, -100 } Result: Inlier (2.28)
Classifying: { 0, -100 } Result: Inlier (2.28)
Classifying: { -10, -10 } Result: Inlier (2.28)
Classifying: { 0, -10 } Result: Inlier (2.28)
Testing inliers:
Classifying: { 0, 0 } Result: Inlier (4.58)
Classifying: { 0.15, 0.15 } Result: Inlier (4.51)
Classifying: { -0.1, 0 } Result: Inlier (4.55)
Classifying: { 0.25, 0 } Result: Inlier (4.64)
括號中的數字是SVM爲該座標給出的分數。使用Accord.NET的SVM(通常情況下),負分是一個類,正分是另一個類。在這裏,一切都有一個積極的分數。 Inliers被正確分類,但異常值(即使是非常遠的值)也被歸類爲內值。
注意,任何其他的時間我訓練與Accord.NET模型,訓練誤差已經相當接近於零,但在這裏,一切都結束了200
這裏的另一個參數集的輸出:
Training model with parameters:
kernelSigma = .3
teacherNu=.8
teacherTolerance=0.01
Training complete - error is 1945.67
Testing outliers:
Classifying: { 1000000, 1000000 } Result: Inlier (20.96)
Classifying: { 0, 1000000 } Result: Inlier (20.96)
Classifying: { 100, -100 } Result: Inlier (20.96)
Classifying: { 0, -100 } Result: Inlier (20.96)
Classifying: { -10, -10 } Result: Inlier (20.96)
Classifying: { 0, -10 } Result: Inlier (20.96)
Testing inliers:
Classifying: { 0, 0 } Result: Inlier (44.52)
Classifying: { 0.15, 0.15 } Result: Inlier (41.62)
Classifying: { -0.1, 0 } Result: Inlier (43.85)
Classifying: { 0.25, 0 } Result: Inlier (40.53)
再次,非常高的訓練錯誤,所有積極的分數。
這些模型肯定會得到某些東西退出訓練 - 內線和外線之間的分數是不同的。但是爲什麼這個簡單的方案不會給出正負符號的結果,因爲它們應該是不同的?
PS。 Here is a similar program測試許多訓練和模型參數的組合,和here is its output。同樣,所有事情都會導致積極的分類評分,高誤差值和錯誤分類的異常值。
請在項目的問題跟蹤器中打開一個問題 – Cesar
此問題已在v3.7.0中得到解決。 – Cesar