我是算法和優化的新手。
我試圖實現獲得k-均值,但到目前爲止尚未解決,結果不佳。
這被用作CVRP模擬的一部分(容量車輛路徑問題)。
我很好奇,如果我解釋引用的算法錯誤。有限k-均值聚類?
編號:"Improved K-Means Algorithm for Capacitated Clustering Problem" (Geetha, Poonthalir, Vanathi)
模擬CVRP擁有15個客戶,其中1個站。
每個客戶都有歐幾里德座標(x,y)和需求。
有3輛,每個具有90
所以容量,容量限制k-均值試圖15級的客戶聚類爲3噸的車輛,其中每個集羣中的總需求必須不超過車輛的能力。
UPDATE:
在所涉及的算法,我沒聽清楚什麼時候用完「下一個最近重心」的什麼必須做的代碼的任何信息。
也就是說,當所有的「最接近的質心」已被檢查,在步驟14.b下面,而customers[1]
仍未分配。
這會導致索引1未分配的客戶。
注:customer[1]
是需求量最大的客戶(30)。
問:滿足這個條件時,代碼應該做些什麼?
這裏是我對引用算法的解釋,請更正我的代碼,謝謝。
- 鑑於
n
請求者(客戶),n
=customerCount
,和貯庫 - Ñ需求,
N個座標(X,Y)的簇的
計算數目,
k
=(所有要求的總和)/vehicleCapacity
選擇初始質心,
5.a.按照demand
,按降序排列的客戶=d_customers
,
5.b.從d_customers
作爲初始質心=centroids[0 .. k-1]
選擇k
第一客戶,創建二進制矩陣
bin_matrix
,尺寸=(customerCount) x (k)
,
6.A中填充bin_matrix
全部爲零start WHILE loop,condition = WHILE
not converged
。
7.a.converged = False
開始FOR循環,條件= FOR
each customers
,
8.A.顧客的索引= I計算從
customers[i]
歐幾里德距離對所有centroids
=>edist
9.A.按升序排列edist
,
9.b.選擇第一centroid
與最近距離=closest_centroid
開始WHILE循環,條件=
while customers[i]
未分配給任何集羣。組中的所有其他未分配的客戶=
G
,
11.A.考慮closest_centroid
作爲質心G
。計算優先級
Pi
每個customers
的G
,
12.a.優先Pi = (distance from customers[i] to closest_cent)/demand[i]
12.b.選擇具有最高優先級的客戶Pi
。
12.c.具有最高優先級的客戶具有索引=hpc
12.d.問:如果找不到最優先的客戶,我們必須做什麼?如果可能的話分配
customers[hpc]
到centroids[closest_centroid]
。
13.a.需求customers[hpc]
=d1
,
13.b.所有質心成員要求的總和=dtot
,
13.c.IF (d1 + dtot) <= vehicleCapacity, THEN
..
13.d.分配customers[hpc]
至centroids[closest_centroid]
13.e.更新bin_matrix
,行索引=hpc
,列索引=closest_centroid
,設置爲1
。IF
customers[i]
是(仍然)not assigned
任何集羣,THEN ..
14.A.選擇next nearest centroid
,與edist
的次近距離。
14.b.問:如果沒有下一個最接近的質心,那麼我們必須做什麼?通過比較先前的矩陣和更新後的矩陣bin_matrix來計算收斂。
15.a.如果bin_matrix
沒有變化,則設置爲converged = True
。否則,從更新的集羣計算
new centroids
。
16.a.根據每個羣集的成員計算新的centroids' coordinates
。
16.b.sum_x
=集羣的所有x-coordinate
的總和members
,
16.c.num_c
=集羣中所有customers (members)
的編號,
16.d.簇的新質心x-coordinate
=sum_x/num_c
。
16.e.使用相同的公式計算集羣的新質心y-coordinate
=sum_y/num_c
。迭代主WHILE循環。
我的代碼始終與在步驟14.b未分配的客戶端。
也就是說,當customers[i]
仍未分配給任何質心,並且它已用完「下一個最接近的質心」。
由此產生的集羣很差。輸出曲線圖:
-In圖片,星形心,正方形是貯庫。
在圖片中,客戶標記爲「1」,需求= 30的客戶始終以沒有分配的羣集結束。
輸出的程序,
k_cluster 3
idx [ 1 -1 1 0 2 0 1 1 2 2 2 0 0 2 0]
centroids [(22.6, 29.2), (34.25, 60.25), (39.4, 33.4)]
members [[3, 14, 12, 5, 11], [0, 2, 6, 7], [9, 8, 4, 13, 10]]
demands [86, 65, 77]
第一和第三羣集,計算差。
idx
與索引「1
」未分配(-1
)
問:這有什麼錯我的解釋和我的執行?
任何更正,建議,幫助,將非常感謝,先進的謝謝。
這裏是我的全碼:
#!/usr/bin/python
# -*- coding: utf-8 -*-
# pastebin.com/UwqUrHhh
# output graph: i.imgur.com/u3v2OFt.png
import math
import random
from operator import itemgetter
from copy import deepcopy
import numpy
import pylab
# depot and customers, [index, x, y, demand]
depot = [0, 30.0, 40.0, 0]
customers = [[1, 37.0, 52.0, 7], \
[2, 49.0, 49.0, 30], [3, 52.0, 64.0, 16], \
[4, 20.0, 26.0, 9], [5, 40.0, 30.0, 21], \
[6, 21.0, 47.0, 15], [7, 17.0, 63.0, 19], \
[8, 31.0, 62.0, 23], [9, 52.0, 33.0, 11], \
[10, 51.0, 21.0, 5], [11, 42.0, 41.0, 19], \
[12, 31.0, 32.0, 29], [13, 5.0, 25.0, 23], \
[14, 12.0, 42.0, 21], [15, 36.0, 16.0, 10]]
customerCount = 15
vehicleCount = 3
vehicleCapacity = 90
assigned = [-1] * customerCount
# number of clusters
k_cluster = 0
# binary matrix
bin_matrix = []
# coordinate of centroids
centroids = []
# total demand for each cluster, must be <= capacity
tot_demand = []
# members of each cluster
members = []
# coordinate of members of each cluster
xy_members = []
def distance(p1, p2):
return math.sqrt((p1[0] - p2[0])**2 + (p1[1] - p2[1])**2)
# capacitated k-means clustering
# http://www.dcc.ufla.br/infocomp/artigos/v8.4/art07.pdf
def cap_k_means():
global k_cluster, bin_matrix, centroids, tot_demand
global members, xy_members, prev_members
# calculate number of clusters
tot_demand = sum([c[3] for c in customers])
k_cluster = int(math.ceil(float(tot_demand)/vehicleCapacity))
print 'k_cluster', k_cluster
# initial centroids = first sorted-customers based on demand
d_customers = sorted(customers, key=itemgetter(3), reverse=True)
centroids, tot_demand, members, xy_members = [], [], [], []
for i in range(k_cluster):
centroids.append(d_customers[i][1:3]) # [x,y]
# initial total demand and members for each cluster
tot_demand.append(0)
members.append([])
xy_members.append([])
# binary matrix, dimension = customerCount-1 x k_cluster
bin_matrix = [[0] * k_cluster for i in range(len(customers))]
converged = False
while not converged: # until no changes in formed-clusters
prev_matrix = deepcopy(bin_matrix)
for i in range(len(customers)):
edist = [] # list of distance to clusters
if assigned[i] == -1: # if not assigned yet
# Calculate the Euclidean distance to each of k-clusters
for k in range(k_cluster):
p1 = (customers[i][1], customers[i][2]) # x,y
p2 = (centroids[k][0], centroids[k][1])
edist.append((distance(p1, p2), k))
# sort, based on closest distance
edist = sorted(edist, key=itemgetter(0))
closest_centroid = 0 # first index of edist
# loop while customer[i] is not assigned
while assigned[i] == -1:
# calculate all unsigned customers (G)'s priority
max_prior = (0, -1) # value, index
for n in range(len(customers)):
pc = customers[n]
if assigned[n] == -1: # if unassigned
# get index of current centroid
c = edist[closest_centroid][1]
cen = centroids[c] # x,y
# distance_cost/demand
p = distance((pc[1], pc[2]), cen)/pc[3]
# find highest priority
if p > max_prior[0]:
max_prior = (p, n) # priority,customer-index
# if highest-priority is not found, what should we do ???
if max_prior[1] == -1:
break
# try to assign current cluster to highest-priority customer
hpc = max_prior[1] # index of highest-priority customer
c = edist[closest_centroid][1] # index of current cluster
# constraint, total demand in a cluster <= capacity
if tot_demand[c] + customers[hpc][3] <= vehicleCapacity:
# assign new member of cluster
members[c].append(hpc) # add index of customer
xy = (customers[hpc][1], customers[hpc][2]) # x,y
xy_members[c].append(xy)
tot_demand[c] += customers[hpc][3]
assigned[hpc] = c # update cluster to assigned-customer
# update binary matrix
bin_matrix[hpc][c] = 1
# if customer is not assigned then,
if assigned[i] == -1:
if closest_centroid < len(edist)-1:
# choose the next nearest centroid
closest_centroid += 1
# if run out of closest centroid, what must we do ???
else:
break # exit without centroid ???
# end while
# end for
# Calculate the new centroid from the formed clusters
for j in range(k_cluster):
xj = sum([cn[0] for cn in xy_members[j]])
yj = sum([cn[1] for cn in xy_members[j]])
xj = float(xj)/len(xy_members[j])
yj = float(yj)/len(xy_members[j])
centroids[j] = (xj, yj)
# calculate converged
converged = numpy.array_equal(numpy.array(prev_matrix), numpy.array(bin_matrix))
# end while
def clustering():
cap_k_means()
# debug plot
idx = numpy.array([c for c in assigned])
xy = numpy.array([(c[1], c[2]) for c in customers])
COLORS = ["Blue", "DarkSeaGreen", "DarkTurquoise",
"IndianRed", "MediumVioletRed", "Orange", "Purple"]
for i in range(min(idx), max(idx)+1):
clr = random.choice(COLORS)
pylab.plot(xy[idx==i, 0], xy[idx==i, 1], color=clr, \
linestyle='dashed', \
marker='o', markerfacecolor=clr, markersize=8)
pylab.plot(centroids[:][0], centroids[:][1], '*k', markersize=12)
pylab.plot(depot[1], depot[2], 'sk', markersize=12)
for i in range(len(idx)):
pylab.annotate(str(i), xy[i])
pylab.savefig('clust1.png')
pylab.show()
return idx
def main():
idx = clustering()
print 'idx', idx
print 'centroids', centroids
print 'members', members
print 'demands', tot_demand
if __name__ == '__main__':
main()
我抵制downvote的衝動 - 這是太多的信息。你會解決排序索引1未分配? – doctorlove
@doctorlove非常感謝您的回覆。我在上面更新了我的問題,就在「更新」部分之後。 (對不起,可能誤解你的評論,我不是英語母語的人,所以我毫不懷疑「解決排序問題」的含義,即使在Google翻譯這句話後,非常抱歉,謝謝。) – silo
s /排序/「處理」 – doctorlove