我的目標是讓每個數據點的k個最近鄰居。我想避免在查找時使用for循環,並在每個rdd_distance
點上同時使用其他的東西,但我無法弄清楚如何執行此操作。如何避免KNN搜索循環?
parsedData = RDD[Object]
//Object have an id and a vector as attribute
//sqdist1 output is a Double
var rdd_distance = parsedData.cartesian(parsedData)
.flatMap { case (x,y) =>
if(x.get_id != y.get_id)
Some((x.get_id,(y.get_id,sqdist1(x.get_vector,y.get_vector))))
else None
}
for(ind1 <- 1 to size) {
val ind2 = ind1.toString
val tab1 = rdd_distance.lookup(ind2)
val rdd_knn0 = sc.parallelize(tab1)
val tab_knn = rdd_knn0.takeOrdered(k)(Ordering[(Double)].on(x=>x._2))
}
這是可能的,而不使用for循環查找?
看看這個https://stackoverflow.com/questions/5751114/nearest-neighbors-in-high-dimensional-data – abalcerek