有限k-均值聚類？

我是算法和優化的新手。
我試圖實現獲得k-均值，但到目前爲止尚未解決，結果不佳。
這被用作CVRP模擬的一部分（容量車輛路徑問題）。
我很好奇，如果我解釋引用的算法錯誤。有限k-均值聚類？

編號："Improved K-Means Algorithm for Capacitated Clustering Problem" (Geetha, Poonthalir, Vanathi)

模擬CVRP擁有15個客戶，其中1個站。
每個客戶都有歐幾里德座標（x，y）和需求。
有3輛，每個具有90

所以容量，容量限制k-均值試圖15級的客戶聚類爲3噸的車輛，其中每個集羣中的總需求必須不超過車輛的能力。

UPDATE：

在所涉及的算法，我沒聽清楚什麼時候用完「下一個最近重心」的什麼必須做的代碼的任何信息。
也就是說，當所有的「最接近的質心」已被檢查，在步驟14.b下面，而customers[1]仍未分配。

這會導致索引1未分配的客戶。
注：customer[1]是需求量最大的客戶（30）。
問：滿足這個條件時，代碼應該做些什麼？

這裏是我對引用算法的解釋，請更正我的代碼，謝謝。

鑑於n請求者（客戶），n = customerCount，和貯庫
Ñ需求，
N個座標（X，Y）的簇的
計算數目，k =（所有要求的總和）/ vehicleCapacity
選擇初始質心，
5.a.按照demand，按降序排列的客戶= d_customers,
5.b.從d_customers作爲初始質心= centroids[0 .. k-1]選擇k第一客戶，
創建二進制矩陣bin_matrix，尺寸= (customerCount) x (k)，
6.A中填充bin_matrix全部爲零
start WHILE loop，condition = WHILE not converged。
7.a.converged = False
開始FOR循環，條件= FOR each customers，
8.A.顧客的索引= I
計算從customers[i]歐幾里德距離對所有centroids =>edist
9.A.按升序排列edist，
9.b.選擇第一centroid與最近距離= closest_centroid
開始WHILE循環，條件= while customers[i]未分配給任何集羣。
組中的所有其他未分配的客戶= G，
11.A.考慮closest_centroid作爲質心G。
計算優先級Pi每個customers的G，
12.a.優先Pi = (distance from customers[i] to closest_cent)/demand[i]
12.b.選擇具有最高優先級的客戶Pi。
12.c.具有最高優先級的客戶具有索引= hpc
12.d.問：如果找不到最優先的客戶，我們必須做什麼？
如果可能的話分配customers[hpc]到centroids[closest_centroid]。
13.a.需求customers[hpc] = d1,
13.b.所有質心成員要求的總和= dtot,
13.c. IF (d1 + dtot) <= vehicleCapacity, THEN ..
13.d.分配customers[hpc]至centroids[closest_centroid]
13.e.更新bin_matrix，行索引= hpc，列索引= closest_centroid，設置爲1。
IF customers[i]是（仍然）not assigned任何集羣，THEN ..
14.A.選擇next nearest centroid，與edist的次近距離。
14.b.問：如果沒有下一個最接近的質心，那麼我們必須做什麼？
通過比較先前的矩陣和更新後的矩陣bin_matrix來計算收斂。
15.a.如果bin_matrix沒有變化，則設置爲converged = True。
否則，從更新的集羣計算new centroids。
16.a.根據每個羣集的成員計算新的centroids' coordinates。
16.b. sum_x =集羣的所有x-coordinate的總和members,
16.c.num_c =集羣中所有customers (members)的編號，
16.d.簇的新質心x-coordinate = sum_x/num_c。
16.e.使用相同的公式計算集羣的新質心y-coordinate = sum_y/num_c。
迭代主WHILE循環。

我的代碼始終與在步驟14.b未分配的客戶端。
也就是說，當customers[i]仍未分配給任何質心，並且它已用完「下一個最接近的質心」。

由此產生的集羣很差。輸出曲線圖：

-In圖片，星形心，正方形是貯庫。
在圖片中，客戶標記爲「1」，需求= 30的客戶始終以沒有分配的羣集結束。

輸出的程序，

k_cluster 3 
idx [ 1 -1 1 0 2 0 1 1 2 2 2 0 0 2 0] 
centroids [(22.6, 29.2), (34.25, 60.25), (39.4, 33.4)] 
members [[3, 14, 12, 5, 11], [0, 2, 6, 7], [9, 8, 4, 13, 10]] 
demands [86, 65, 77]

第一和第三羣集，計算差。
idx與索引「1」未分配（-1）

問：這有什麼錯我的解釋和我的執行？
任何更正，建議，幫助，將非常感謝，先進的謝謝。

這裏是我的全碼：

#!/usr/bin/python 
# -*- coding: utf-8 -*- 
# pastebin.com/UwqUrHhh 
# output graph: i.imgur.com/u3v2OFt.png 

import math 
import random 
from operator import itemgetter 
from copy import deepcopy 
import numpy 
import pylab 

# depot and customers, [index, x, y, demand] 
depot = [0, 30.0, 40.0, 0] 
customers = [[1, 37.0, 52.0, 7], \ 
      [2, 49.0, 49.0, 30], [3, 52.0, 64.0, 16], \ 
      [4, 20.0, 26.0, 9], [5, 40.0, 30.0, 21], \ 
      [6, 21.0, 47.0, 15], [7, 17.0, 63.0, 19], \ 
      [8, 31.0, 62.0, 23], [9, 52.0, 33.0, 11], \ 
      [10, 51.0, 21.0, 5], [11, 42.0, 41.0, 19], \ 
      [12, 31.0, 32.0, 29], [13, 5.0, 25.0, 23], \ 
      [14, 12.0, 42.0, 21], [15, 36.0, 16.0, 10]] 
customerCount = 15 
vehicleCount = 3 
vehicleCapacity = 90 
assigned = [-1] * customerCount 

# number of clusters 
k_cluster = 0 
# binary matrix 
bin_matrix = [] 
# coordinate of centroids 
centroids = [] 
# total demand for each cluster, must be <= capacity 
tot_demand = [] 
# members of each cluster 
members = [] 
# coordinate of members of each cluster 
xy_members = [] 

def distance(p1, p2): 
    return math.sqrt((p1[0] - p2[0])**2 + (p1[1] - p2[1])**2) 

# capacitated k-means clustering 
# http://www.dcc.ufla.br/infocomp/artigos/v8.4/art07.pdf 
def cap_k_means(): 
    global k_cluster, bin_matrix, centroids, tot_demand 
    global members, xy_members, prev_members 

    # calculate number of clusters 
    tot_demand = sum([c[3] for c in customers]) 
    k_cluster = int(math.ceil(float(tot_demand)/vehicleCapacity)) 
    print 'k_cluster', k_cluster 

    # initial centroids = first sorted-customers based on demand 
    d_customers = sorted(customers, key=itemgetter(3), reverse=True) 
    centroids, tot_demand, members, xy_members = [], [], [], [] 
    for i in range(k_cluster): 
     centroids.append(d_customers[i][1:3]) # [x,y] 

     # initial total demand and members for each cluster 
     tot_demand.append(0) 
     members.append([]) 
     xy_members.append([]) 

    # binary matrix, dimension = customerCount-1 x k_cluster 
    bin_matrix = [[0] * k_cluster for i in range(len(customers))] 

    converged = False 
    while not converged: # until no changes in formed-clusters 
     prev_matrix = deepcopy(bin_matrix) 

     for i in range(len(customers)): 
      edist = [] # list of distance to clusters 

      if assigned[i] == -1: # if not assigned yet 
       # Calculate the Euclidean distance to each of k-clusters 
       for k in range(k_cluster): 
        p1 = (customers[i][1], customers[i][2]) # x,y 
        p2 = (centroids[k][0], centroids[k][1]) 
        edist.append((distance(p1, p2), k)) 

       # sort, based on closest distance 
       edist = sorted(edist, key=itemgetter(0)) 

      closest_centroid = 0 # first index of edist 
      # loop while customer[i] is not assigned 
      while assigned[i] == -1: 
       # calculate all unsigned customers (G)'s priority 
       max_prior = (0, -1) # value, index 
       for n in range(len(customers)): 
        pc = customers[n] 

        if assigned[n] == -1: # if unassigned 
         # get index of current centroid 
         c = edist[closest_centroid][1] 
         cen = centroids[c]  # x,y 

         # distance_cost/demand 
         p = distance((pc[1], pc[2]), cen)/pc[3] 

         # find highest priority 
         if p > max_prior[0]: 
          max_prior = (p, n) # priority,customer-index 

       # if highest-priority is not found, what should we do ??? 
       if max_prior[1] == -1: 
        break 

       # try to assign current cluster to highest-priority customer 
       hpc = max_prior[1] # index of highest-priority customer 
       c = edist[closest_centroid][1] # index of current cluster 

       # constraint, total demand in a cluster <= capacity 
       if tot_demand[c] + customers[hpc][3] <= vehicleCapacity: 
        # assign new member of cluster 
        members[c].append(hpc) # add index of customer 

        xy = (customers[hpc][1], customers[hpc][2]) # x,y 
        xy_members[c].append(xy) 

        tot_demand[c] += customers[hpc][3] 
        assigned[hpc] = c # update cluster to assigned-customer 

        # update binary matrix 
        bin_matrix[hpc][c] = 1 

       # if customer is not assigned then, 
       if assigned[i] == -1: 
        if closest_centroid < len(edist)-1: 
         # choose the next nearest centroid 
         closest_centroid += 1 

        # if run out of closest centroid, what must we do ??? 
        else: 
         break # exit without centroid ??? 

      # end while 
     # end for 

     # Calculate the new centroid from the formed clusters 
     for j in range(k_cluster): 
      xj = sum([cn[0] for cn in xy_members[j]]) 
      yj = sum([cn[1] for cn in xy_members[j]]) 
      xj = float(xj)/len(xy_members[j]) 
      yj = float(yj)/len(xy_members[j]) 
      centroids[j] = (xj, yj) 

     # calculate converged 
     converged = numpy.array_equal(numpy.array(prev_matrix), numpy.array(bin_matrix)) 
    # end while 

def clustering(): 
    cap_k_means() 

    # debug plot 
    idx = numpy.array([c for c in assigned]) 
    xy = numpy.array([(c[1], c[2]) for c in customers]) 

    COLORS = ["Blue", "DarkSeaGreen", "DarkTurquoise", 
      "IndianRed", "MediumVioletRed", "Orange", "Purple"] 

    for i in range(min(idx), max(idx)+1): 
     clr = random.choice(COLORS) 
     pylab.plot(xy[idx==i, 0], xy[idx==i, 1], color=clr, \ 
      linestyle='dashed', \ 
      marker='o', markerfacecolor=clr, markersize=8) 
    pylab.plot(centroids[:][0], centroids[:][1], '*k', markersize=12) 
    pylab.plot(depot[1], depot[2], 'sk', markersize=12) 

    for i in range(len(idx)): 
     pylab.annotate(str(i), xy[i]) 

    pylab.savefig('clust1.png') 
    pylab.show() 

    return idx 

def main(): 
    idx = clustering() 
    print 'idx', idx 
    print 'centroids', centroids 
    print 'members', members 
    print 'demands', tot_demand 

if __name__ == '__main__': 
    main()

來源

2013-08-01 silo

我抵制downvote的衝動 - 這是太多的信息。你會解決排序索引1未分配？ – doctorlove

@doctorlove非常感謝您的回覆。我在上面更新了我的問題，就在「更新」部分之後。（對不起，可能誤解你的評論，我不是英語母語的人，所以我毫不懷疑「解決排序問題」的含義，即使在Google翻譯這句話後，非常抱歉，謝謝。） – silo

s /排序/「處理」 – doctorlove

當總需求接近總產能時，這個問題開始出現在bin packing的方面。正如你發現的那樣，這種特定算法的貪婪方法並不總是成功的。我不知道作者是否承認這一點，但如果他們不這樣做，審稿人本應該抓住它。

如果您想繼續使用此算法，我會嘗試使用integer programming將請求者分配給質心。

來源

2013-08-01 14:01:30

我明白了。非常感謝你。然後我將切換到混合整數編程。 – silo

沒有通過所有的細節去，你引用的文件在節結束算法說

if ri is not assigned then 
    choose the next nearest centroid 
end if

必須有下一個最接近的質心 - 如果兩個是等距的，我認爲你選擇哪一個並不重要。

來源

2013-08-01 12:39:53 doctorlove

非常感謝。我將下一個最接近的質心存儲在列表中。最多有3個最接近的質心。在我的情況中，所有3個選擇的是次最接近的質心，而客戶[1]仍未分配。它基於具有最高優先級的「自己和其他」未分配客戶進行「交叉」分配，因此客戶[1]從來沒有機會分配，因爲它具有較低的優先級，甚至直到沒有更多的「下一個最近的質心「（已經達到最接近質心的第三個指數）。謝謝。 – silo

哎喲 - 希望知道算法的這個變體的其他人會介入並幫助 – doctorlove

非常感謝您，因爲您已經檢查，回答並評論了我的問題，我非常感謝。 – silo

有限k-均值聚類？

回答

相關問題