這個乾淨的數據爲什麼會提供奇怪的SVM分類結果？

我的問題和疑問是粗體下面。這個乾淨的數據爲什麼會提供奇怪的SVM分類結果？

我已經使用Accord.NET的支持向量機，通過在他們的文檔頁面上的例子如this one成功。但是，使用KernelSupportVectorMachine與OneclassSupportVectorLearning進行訓練時，訓練過程會導致較大的錯誤值和不正確的分類。

下面的模擬示例顯示了我的意思。它生成一個密集的訓練點集，然後訓練一個支持向量機將點歸類爲集羣內的離羣點或離羣點。訓練簇僅有0.6 0.6平方以原點爲中心，訓練點被以0.1的間隔隔開：

static void Main(string[] args) 
{ 
    // Model and training parameters 
    double kernelSigma = 0.1; 
    double teacherNu = 0.5; 
    double teacherTolerance = 0.01; 


    // Generate input point cloud, a 0.6 x 0.6 square centered at 0,0. 
    double[][] trainingInputs = new double[49][]; 
    int inputIdx = 0; 
    for (double x = -0.3; x <= 0.31; x += 0.1) { 
     for (double y = -0.3; y <= 0.31; y += 0.1) { 
      trainingInputs[inputIdx] = new double[] { x, y }; 
      inputIdx++; 
     } 
    } 


    // Generate inlier and outlier test points. 
    double[][] outliers = 
    { 
     new double[] { 1E6, 1E6 }, // Very far outlier 
     new double[] { 0, 1E6 }, // Very far outlier 
     new double[] { 100, -100 }, // Far outlier 
     new double[] { 0, -100 }, // Far outlier 
     new double[] { -10, -10 }, // Still far outlier 
     new double[] { 0, -10 }, // Still far outlier 
    }; 
    double[][] inliers = 
    { 
     new double[] { 0, 0 },  // Middle of cluster 
     new double[] { .15, .15 }, // Halfway to corner of cluster 
     new double[] { -0.1, 0 }, // Comfortably inside cluster 
     new double[] { 0.25, 0 } // Near inside edge of cluster 
    }; 


    // Construct the kernel, model, and trainer, then train. 
    Console.WriteLine($"Training model with parameters:"); 
    Console.WriteLine($" kernelSigma = {kernelSigma.ToString("#.##")}"); 
    Console.WriteLine($" teacherNu={teacherNu.ToString("#.##")}"); 
    Console.WriteLine($" teacherTolerance={teacherTolerance}"); 
    Console.WriteLine(); 

    var kernel = new Gaussian(kernelSigma); 
    var svm = new KernelSupportVectorMachine(kernel, inputs: 1); 
    var teacher = new OneclassSupportVectorLearning(svm, trainingInputs) 
    { 
     Nu = teacherNu, 
     Tolerance = teacherTolerance 
    }; 
    double error = teacher.Run(); 

    Console.WriteLine($"Training complete - error is {error.ToString("#.##")}"); 
    Console.WriteLine(); 


    // Test trained classifier. 
    Console.WriteLine("Testing outliers:"); 
    foreach (double[] outlier in outliers) { 
     WriteResultDetail(svm, outlier); 
    } 
    Console.WriteLine(); 
    Console.WriteLine("Testing inliers:"); 
    foreach (double[] inlier in inliers) { 
     WriteResultDetail(svm, inlier); 
    } 
} 

private static void WriteResultDetail(KernelSupportVectorMachine svm, double[] coordinate) 
{ 
    string prettyCoord = $"{{ {string.Join(", ", coordinate)} }}".PadRight(20); 
    Console.Write($"Classifying: {prettyCoord} Result: "); 

    // Classify coordinate, print results. 
    double result = svm.Compute(coordinate); 
    if (Math.Sign(result) == 1) { 
     Console.Write("Inlier"); 
    } 
    else { 
     Console.Write("Outlier"); 
    } 
    Console.Write($" ({result.ToString("#.##")})\n"); 
}

下面是一個合理的參數集合中的輸出：

Training model with parameters: 
    kernelSigma = .1 
    teacherNu=.5 
    teacherTolerance=0.01 

Training complete - error is 222.4 

Testing outliers: 
Classifying: { 1000000, 1000000 } Result: Inlier (2.28) 
Classifying: { 0, 1000000 }  Result: Inlier (2.28) 
Classifying: { 100, -100 }  Result: Inlier (2.28) 
Classifying: { 0, -100 }   Result: Inlier (2.28) 
Classifying: { -10, -10 }   Result: Inlier (2.28) 
Classifying: { 0, -10 }   Result: Inlier (2.28) 

Testing inliers: 
Classifying: { 0, 0 }    Result: Inlier (4.58) 
Classifying: { 0.15, 0.15 }  Result: Inlier (4.51) 
Classifying: { -0.1, 0 }   Result: Inlier (4.55) 
Classifying: { 0.25, 0 }   Result: Inlier (4.64)

括號中的數字是SVM爲該座標給出的分數。使用Accord.NET的SVM（通常情況下），負分是一個類，正分是另一個類。在這裏，一切都有一個積極的分數。 Inliers被正確分類，但異常值（即使是非常遠的值）也被歸類爲內值。

注意，任何其他的時間我訓練與Accord.NET模型，訓練誤差已經相當接近於零，但在這裏，一切都結束了200

這裏的另一個參數集的輸出：

Training model with parameters: 
    kernelSigma = .3 
    teacherNu=.8 
    teacherTolerance=0.01 

Training complete - error is 1945.67 

Testing outliers: 
Classifying: { 1000000, 1000000 } Result: Inlier (20.96) 
Classifying: { 0, 1000000 }  Result: Inlier (20.96) 
Classifying: { 100, -100 }  Result: Inlier (20.96) 
Classifying: { 0, -100 }   Result: Inlier (20.96) 
Classifying: { -10, -10 }   Result: Inlier (20.96) 
Classifying: { 0, -10 }   Result: Inlier (20.96) 

Testing inliers: 
Classifying: { 0, 0 }    Result: Inlier (44.52) 
Classifying: { 0.15, 0.15 }  Result: Inlier (41.62) 
Classifying: { -0.1, 0 }   Result: Inlier (43.85) 
Classifying: { 0.25, 0 }   Result: Inlier (40.53)

再次，非常高的訓練錯誤，所有積極的分數。

這些模型肯定會得到某些東西退出訓練 - 內線和外線之間的分數是不同的。但是爲什麼這個簡單的方案不會給出正負符號的結果，因爲它們應該是不同的？

PS。 Here is a similar program測試許多訓練和模型參數的組合，和here is its output。同樣，所有事情都會導致積極的分類評分，高誤差值和錯誤分類的異常值。

來源

2016-07-28 kdbanman

請在項目的問題跟蹤器中打開一個問題 – Cesar

此問題已在v3.7.0中得到解決。 – Cesar

已在Accord.NET 3.7.0版中解決了issue raised in the question。 A unit test with an example similar to yours也已添加到提交be81aab。

來源

2017-08-22 19:56:21 Cesar

謝謝@Cesar！很高興看到該項目仍在維持。 – kdbanman

嗨@kdbanman，非常感謝在Stackoverflow中報告問題 - 但直到前幾天我纔看到它，所以是的，它需要一些時間來修復它。如果您仍然在項目中使用該框架，請立即通過https://github.com/accord-net/framework/issues直接在項目問題跟蹤器中提交問題報告。我希望這個框架對你仍然有用！ – Cesar

這個乾淨的數據爲什麼會提供奇怪的SVM分類結果？

回答

相關問題