2017-03-23 79 views
0

canberra distance - inconsistent results類似,我寫了自己的距離計算,但是我想對更多的數據執行此操作,然後根據結果創建距離矩陣。堪培拉距離矩陣手動計算

我的初始功能是

canb.dist <- function(x, j) sum((abs(x-j))/(abs(x)+abs(j)))

現在,我想這個功能適用於每對在我的數據幀的行,然後創建從該計算的距離矩陣。比方說,我的數據是:

data<-data.frame(replicate(500,sample(1:100,50,rep=TRUE))) 

我掙扎在這下一部分,如何運用這每對行的再創造,基本上模仿

dist(data,method="canberra") 

我有一個矩陣企圖:

for (y in 1:50) 
{ 
    for (z in 2:50) 
    { 
    canb.dist(data[y,1:500],data[z,1:500]) 
    } 
} 

但很明顯它沒有。有沒有辦法通過每對運行並手動複製距離矩陣?

回答

1

您可以使用combn創建行對並計算每對的堪培拉距離。然後轉換成dist類,使用稀疏Matrix

#OP's data 
set.seed(1) 
canb.dist <- function(x, j) sum((abs(x-j))/(abs(x)+abs(j))) 
data <- data.frame(replicate(500,sample(1:100,50,rep=TRUE))) 
refdist <- dist(data, method="canberra") 

#convert to matrix 
mat <- as.matrix(data) 

#sequence of row indices 
rowidx <- seq_len(nrow(mat)) 

#calculate OP's Canberra dist for each pair of rows 
triangular <- combn(rowidx, 2, function(x) c(x[1], x[2], canb.dist(mat[x[1],], mat[x[2],]))) 

#construct the matrix given the indices and values using Matrix library, 
#convert into a matrix before converting into a dist class 
#the values refer to the diagonal, lower triangular and upper triangular 
library(Matrix) 
ansdist <- as.dist(as.matrix(sparseMatrix(
    i=c(rowidx, triangular[1,], triangular[2,]), 
    j=c(rowidx, triangular[2,], triangular[1,]), 
    x=c(rep(0, length(rowidx)), triangular[3,], triangular[3,]) 
))) 

#idea from http://stackoverflow.com/questions/17375056/r-sparse-matrix-conversion/17375747#17375747 
range(as.matrix(refdist) - as.matrix(ansdist)) 
+0

這工作完全索引和值轉換成一個矩陣。我並不認爲這會像它的實際情況那樣複雜,但非常感謝! – coderX