使用Accord.net獲取從數據點到其質心的距離

我正在使用Accord.net library做一些聚類工作。最終，我試圖找到the elbow method需要一些相對簡單的計算使用的羣集的最佳數量。但是，我很難得到我需要的值，以便確定在我的KMeans建模中使用的最佳數K。使用Accord.net獲取從數據點到其質心的距離

我有一些示例數據/代碼：

open Accord 
open Accord.Math 
open Accord.MachineLearning 
open Accord.Statistics 
open Accord.Statistics.Analysis 

let x = [| 
    [|4.0; 1.0; 1.0; 2.0|]; 
    [|2.0; 4.0; 1.0; 2.0|]; 
    [|2.0; 3.0; 1.0; 1.0|]; 
    [|3.0; 6.0; 2.0; 1.0|]; 
    [|4.0; 4.0; 1.0; 1.0|]; 
    [|5.0; 10.0; 1.0; 2.0|]; 
    [|7.0; 8.0; 1.0; 2.0|]; 
    [|6.0; 5.0; 1.0; 1.0|]; 
    [|7.0; 7.0; 2.0; 1.0|]; 
    [|5.0; 8.0; 1.0; 1.0|]; 
    [|4.0; 1.0; 1.0; 2.0|]; 
    [|3.0; 5.0; 0.0; 3.0|]; 
    [|1.0; 2.0; 0.0; 0.0|]; 
    [|4.0; 7.0; 1.0; 2.0|]; 
    [|5.0; 3.0; 2.0; 0.0|]; 
    [|4.0; 11.0; 0.0; 3.0|]; 
    [|8.0; 7.0; 2.0; 1.0|]; 
    [|5.0; 6.0; 0.0; 2.0|]; 
    [|8.0; 6.0; 3.0; 0.0|]; 
    [|4.0; 9.0; 0.0; 2.0|] 
    |]

，我可以生成簇很輕鬆地與

let kmeans = new KMeans 5 

let kmeansMod = kmeans.Learn x 
let clusters = kmeansMod.Decide x

，但我怎麼能計算出從任何給定的數據點x到它的距離分配的集羣？我沒有看到KMeans Cluster Collection class documentation中的任何內容，這表明已經爲此問題實施了一種方法。

它似乎應該是相對簡單的計算這個距離，但我很茫然。難道是因爲做這樣的事情

let dataAndClusters = Array.zip clusters x 

let getCentroid (m: KMeansClusterCollection) (i: int) = 
    m.Centroids.[i] 

dataAndClusters 
|> Array.map (fun (c, d) -> (c, (getCentroid kmeansMod c) 
           |> Array.map2 (-) d 
           |> Array.sum))

val it : (int * float) [] = 
    [|(1, 0.8); (0, -1.5); (1, -0.2); (0, 1.5); (0, -0.5); (4, 0.0); (2, 1.4); 
    (2, -3.6); (2, 0.4); (3, 0.75); (1, 0.8); (0, 0.5); (1, -4.2); (3, -0.25); 
    (1, 2.8); (4, 0.0); (2, 1.4); (3, -1.25); (2, 0.4); (3, 0.75)|]

我是不是正確地計算這個距離，容易嗎？我懷疑不是。

正如我所提到的，我期待確定在KMeans聚類中使用的K的正確數量。我只是認爲我會使用the second paragraph of this Stats.StackExchange.com answer中列出的簡單算法。 請注意，我不反對使用頂部答案底部的「差距統計」。

來源

2016-12-13 Steven

您應該能夠使用Scores（）方法而不是Decide（）計算距離其最近的集羣的距離。 – Cesar

原來，我不是正確計算距離，但我很接近。

做了一些更多的挖掘，我看到了this similar question, but for the R language，並在我自己的R會話中破壞了接受的答案中列出的過程。

的步驟似乎是非常簡單的：

1. From each data value, subtract the centroid values 
2. Sum the differences for a given data/centroid pair 
3. Square the differences 
4. Find the square root of the differences.

對於我上面的數據。例如，它會打破這樣的：

let distances = 
    dataAndClusters 
    |> Array.map (fun (c, d) -> (c, ((getCentroid kmeansMod c) 
            |> Array.map2 (-) d 
            |> Array.sum 
            |> float) ** 2.0 
            |> sqrt))

注：將兩條線，

|> float) ** 2.0將該值轉換爲浮點數，以便它可以平方（即，x**y）

和

|> sqrt)指找到的值的平方根。

可能有一個內置的方法來做到這一點，但我還沒有找到它。現在，這對我有用。

來源

2016-12-13 17:05:18 Steven

使用Accord.net獲取從數據點到其質心的距離

回答

相關問題