2013-08-01 116 views
1

我是算法和優化的新手。
我試圖實現獲得k-均值,但到目前爲止尚未解決,結果不佳。
這被用作CVRP模擬的一部分(容量車輛路徑問題)。
我很好奇,如果我解釋引用的算法錯誤。有限k-均值聚類?

編號:"Improved K-Means Algorithm for Capacitated Clustering Problem" (Geetha, Poonthalir, Vanathi)

模擬CVRP擁有15個客戶,其中1個站。
每個客戶都有歐幾里德座標(x,y)和需求。
有3輛,每個具有90

所以容量,容量限制k-均值試圖15級的客戶聚類爲3噸的車輛,其中每個集羣中的總需求必須不超過車輛的能力。

UPDATE:

在所涉及的算法,我沒聽清楚什麼時候用完「下一個最近重心」的什麼必須做的代碼的任何信息。
也就是說,當所有的「最接近的質心」已被檢查,在步驟14.b下面,而customers[1]仍未分配。

這會導致索引1未分配的客戶。
注:customer[1]是需求量最大的客戶(30)。
問:滿足這個條件時,代碼應該做些什麼?


這裏是我對引用算法的解釋,請更正我的代碼,謝謝。

  1. 鑑於n請求者(客戶),n = customerCount,和貯庫
  2. Ñ需求,
  3. N個座標(X,Y)的簇的

  4. 計算數目,k =(所有要求的總和)/ vehicleCapacity

  5. 選擇初始質心,
    5.a.按照demand,按降序排列的客戶= d_customers,
    5.b.從d_customers作爲初始質心= centroids[0 .. k-1]選擇k第一客戶,

  6. 創建二進制矩陣bin_matrix,尺寸= (customerCount) x (k)
    6.A中填充bin_matrix全部爲零

  7. start WHILE loop,condition = WHILE not converged
    7.a.converged = False

  8. 開始FOR循環,條件= FOR each customers
    8.A.顧客的索引= I

  9. 計算從customers[i]歐幾里德距離對所有centroids =>edist
    9.A.按升序排列edist
    9.b.選擇第一centroid與最近距離= closest_centroid

  10. 開始WHILE循環,條件= while customers[i]未分配給任何集羣。

  11. 組中的所有其他未分配的客戶= G
    11.A.考慮closest_centroid作爲質心G

  12. 計算優先級Pi每個customersG
    12.a.優先Pi = (distance from customers[i] to closest_cent)/demand[i]
    12.b.選擇具有最高優先級的客戶Pi
    12.c.具有最高優先級的客戶具有索引= hpc
    12.d.問:如果找不到最優先的客戶,我們必須做什麼?

  13. 如果可能的話分配customers[hpc]centroids[closest_centroid]
    13.a.需求customers[hpc] = d1,
    13.b.所有質心成員要求的總和= dtot,
    13.c. IF (d1 + dtot) <= vehicleCapacity, THEN ..
    13.d.分配customers[hpc]centroids[closest_centroid]
    13.e.更新bin_matrix,行索引= hpc,列索引= closest_centroid,設置爲1

  14. IF customers[i]是(仍然)not assigned任何集羣,THEN ..
    14.A.選擇next nearest centroid,與edist的次近距離。
    14.b.問:如果沒有下一個最接近的質心,那麼我們必須做什麼?

  15. 通過比較先前的矩陣和更新後的矩陣bin_matrix來計算收斂。
    15.a.如果bin_matrix沒有變化,則設置爲converged = True

  16. 否則,從更新的集羣計算new centroids
    16.a.根據每個羣集的成員計算新的centroids' coordinates
    16.b. sum_x =集羣的所有x-coordinate的總和members,
    16.c.num_c =集羣中所有customers (members)的編號,
    16.d.簇的新質心x-coordinate = sum_x/num_c
    16.e.使用相同的公式計算集羣的新質心y-coordinate = sum_y/num_c

  17. 迭代主WHILE循環。

我的代碼始終與在步驟14.b未分配的客戶端。
也就是說,當customers[i]仍未分配給任何質心,並且它已用完「下一個最接近的質心」。

由此產生的集羣很差。輸出曲線圖:

Image

-In圖片,星形心,正方形是貯庫。
在圖片中,客戶標記爲「1」,需求= 30的客戶始終以沒有分配的羣集結束。

輸出的程序,

k_cluster 3 
idx [ 1 -1 1 0 2 0 1 1 2 2 2 0 0 2 0] 
centroids [(22.6, 29.2), (34.25, 60.25), (39.4, 33.4)] 
members [[3, 14, 12, 5, 11], [0, 2, 6, 7], [9, 8, 4, 13, 10]] 
demands [86, 65, 77] 

第一和第三羣集,計算差。
idx與索引「1」未分配(-1

問:這有什麼錯我的解釋和我的執行?
任何更正,建議,幫助,將非常感謝,先進的謝謝。

這裏是我的全碼:

#!/usr/bin/python 
# -*- coding: utf-8 -*- 
# pastebin.com/UwqUrHhh 
# output graph: i.imgur.com/u3v2OFt.png 

import math 
import random 
from operator import itemgetter 
from copy import deepcopy 
import numpy 
import pylab 

# depot and customers, [index, x, y, demand] 
depot = [0, 30.0, 40.0, 0] 
customers = [[1, 37.0, 52.0, 7], \ 
      [2, 49.0, 49.0, 30], [3, 52.0, 64.0, 16], \ 
      [4, 20.0, 26.0, 9], [5, 40.0, 30.0, 21], \ 
      [6, 21.0, 47.0, 15], [7, 17.0, 63.0, 19], \ 
      [8, 31.0, 62.0, 23], [9, 52.0, 33.0, 11], \ 
      [10, 51.0, 21.0, 5], [11, 42.0, 41.0, 19], \ 
      [12, 31.0, 32.0, 29], [13, 5.0, 25.0, 23], \ 
      [14, 12.0, 42.0, 21], [15, 36.0, 16.0, 10]] 
customerCount = 15 
vehicleCount = 3 
vehicleCapacity = 90 
assigned = [-1] * customerCount 

# number of clusters 
k_cluster = 0 
# binary matrix 
bin_matrix = [] 
# coordinate of centroids 
centroids = [] 
# total demand for each cluster, must be <= capacity 
tot_demand = [] 
# members of each cluster 
members = [] 
# coordinate of members of each cluster 
xy_members = [] 

def distance(p1, p2): 
    return math.sqrt((p1[0] - p2[0])**2 + (p1[1] - p2[1])**2) 

# capacitated k-means clustering 
# http://www.dcc.ufla.br/infocomp/artigos/v8.4/art07.pdf 
def cap_k_means(): 
    global k_cluster, bin_matrix, centroids, tot_demand 
    global members, xy_members, prev_members 

    # calculate number of clusters 
    tot_demand = sum([c[3] for c in customers]) 
    k_cluster = int(math.ceil(float(tot_demand)/vehicleCapacity)) 
    print 'k_cluster', k_cluster 

    # initial centroids = first sorted-customers based on demand 
    d_customers = sorted(customers, key=itemgetter(3), reverse=True) 
    centroids, tot_demand, members, xy_members = [], [], [], [] 
    for i in range(k_cluster): 
     centroids.append(d_customers[i][1:3]) # [x,y] 

     # initial total demand and members for each cluster 
     tot_demand.append(0) 
     members.append([]) 
     xy_members.append([]) 

    # binary matrix, dimension = customerCount-1 x k_cluster 
    bin_matrix = [[0] * k_cluster for i in range(len(customers))] 

    converged = False 
    while not converged: # until no changes in formed-clusters 
     prev_matrix = deepcopy(bin_matrix) 

     for i in range(len(customers)): 
      edist = [] # list of distance to clusters 

      if assigned[i] == -1: # if not assigned yet 
       # Calculate the Euclidean distance to each of k-clusters 
       for k in range(k_cluster): 
        p1 = (customers[i][1], customers[i][2]) # x,y 
        p2 = (centroids[k][0], centroids[k][1]) 
        edist.append((distance(p1, p2), k)) 

       # sort, based on closest distance 
       edist = sorted(edist, key=itemgetter(0)) 

      closest_centroid = 0 # first index of edist 
      # loop while customer[i] is not assigned 
      while assigned[i] == -1: 
       # calculate all unsigned customers (G)'s priority 
       max_prior = (0, -1) # value, index 
       for n in range(len(customers)): 
        pc = customers[n] 

        if assigned[n] == -1: # if unassigned 
         # get index of current centroid 
         c = edist[closest_centroid][1] 
         cen = centroids[c]  # x,y 

         # distance_cost/demand 
         p = distance((pc[1], pc[2]), cen)/pc[3] 

         # find highest priority 
         if p > max_prior[0]: 
          max_prior = (p, n) # priority,customer-index 

       # if highest-priority is not found, what should we do ??? 
       if max_prior[1] == -1: 
        break 

       # try to assign current cluster to highest-priority customer 
       hpc = max_prior[1] # index of highest-priority customer 
       c = edist[closest_centroid][1] # index of current cluster 

       # constraint, total demand in a cluster <= capacity 
       if tot_demand[c] + customers[hpc][3] <= vehicleCapacity: 
        # assign new member of cluster 
        members[c].append(hpc) # add index of customer 

        xy = (customers[hpc][1], customers[hpc][2]) # x,y 
        xy_members[c].append(xy) 

        tot_demand[c] += customers[hpc][3] 
        assigned[hpc] = c # update cluster to assigned-customer 

        # update binary matrix 
        bin_matrix[hpc][c] = 1 

       # if customer is not assigned then, 
       if assigned[i] == -1: 
        if closest_centroid < len(edist)-1: 
         # choose the next nearest centroid 
         closest_centroid += 1 

        # if run out of closest centroid, what must we do ??? 
        else: 
         break # exit without centroid ??? 

      # end while 
     # end for 

     # Calculate the new centroid from the formed clusters 
     for j in range(k_cluster): 
      xj = sum([cn[0] for cn in xy_members[j]]) 
      yj = sum([cn[1] for cn in xy_members[j]]) 
      xj = float(xj)/len(xy_members[j]) 
      yj = float(yj)/len(xy_members[j]) 
      centroids[j] = (xj, yj) 

     # calculate converged 
     converged = numpy.array_equal(numpy.array(prev_matrix), numpy.array(bin_matrix)) 
    # end while 

def clustering(): 
    cap_k_means() 

    # debug plot 
    idx = numpy.array([c for c in assigned]) 
    xy = numpy.array([(c[1], c[2]) for c in customers]) 

    COLORS = ["Blue", "DarkSeaGreen", "DarkTurquoise", 
      "IndianRed", "MediumVioletRed", "Orange", "Purple"] 

    for i in range(min(idx), max(idx)+1): 
     clr = random.choice(COLORS) 
     pylab.plot(xy[idx==i, 0], xy[idx==i, 1], color=clr, \ 
      linestyle='dashed', \ 
      marker='o', markerfacecolor=clr, markersize=8) 
    pylab.plot(centroids[:][0], centroids[:][1], '*k', markersize=12) 
    pylab.plot(depot[1], depot[2], 'sk', markersize=12) 

    for i in range(len(idx)): 
     pylab.annotate(str(i), xy[i]) 

    pylab.savefig('clust1.png') 
    pylab.show() 

    return idx 

def main(): 
    idx = clustering() 
    print 'idx', idx 
    print 'centroids', centroids 
    print 'members', members 
    print 'demands', tot_demand 

if __name__ == '__main__': 
    main() 
+0

我抵制downvote的衝動 - 這是太多的信息。你會解決排序索引1未分配? – doctorlove

+0

@doctorlove非常感謝您的回覆。我在上面更新了我的問題,就在「更新」部分之後。 (對不起,可能誤解你的評論,我不是英語母語的人,所以我毫不懷疑「解決排序問題」的含義,即使在Google翻譯這句話後,非常抱歉,謝謝。) – silo

+0

s /排序/「處理」 – doctorlove

回答

1

當總需求接近總產能時,這個問題開始出現在bin packing的方面。正如你發現的那樣,這種特定算法的貪婪方法並不總是成功的。我不知道作者是否承認這一點,但如果他們不這樣做,審稿人本應該抓住它。

如果您想繼續使用此算法,我會嘗試使用integer programming將請求者分配給質心。

+0

我明白了。非常感謝你。然後我將切換到混合整數編程。 – silo

0

沒有通過所有的細節去,你引用的文件在節結束算法說

if ri is not assigned then 
    choose the next nearest centroid 
end if 

5.

必須有下一個最接近的質心 - 如果兩個是等距的,我認爲你選擇哪一個並不重要。

+0

非常感謝。我將下一個最接近的質心存儲在列表中。最多有3個最接近的質心。在我的情況中,所有3個選擇的是次最接近的質心,而客戶[1]仍未分配。它基於具有最高優先級的「自己和其他」未分配客戶進行「交叉」分配,因此客戶[1]從來沒有機會分配,因爲它具有較低的優先級,甚至直到沒有更多的「下一個最近的質心「(已經達到最接近質心的第三個指數)。謝謝。 – silo

+0

哎喲 - 希望知道算法的這個變體的其他人會介入並幫助 – doctorlove

+0

非常感謝您,因爲您已經檢查,回答並評論了我的問題,我非常感謝。 – silo