2013-10-22 75 views
0

我是Python的新手,我試圖開發一個代碼,該代碼應該基於名爲Pycluster的預定義包執行K-Means集羣。一開始,我一直在使用固定數量的集羣(n = 10個集羣)進行集羣,代碼工作正常。我嘗試擴展一些代碼,以便不僅僅製作10個集羣,我試圖建立一個循環,將所需數量的集羣從2增加到10(或更多)。正如我所說,這個問題已經開始了,我對Python完全陌生。 我開發的代碼可以追溯到如下所示。我意識到錯誤從代碼行33到49開始。 我真的很感謝提供的任何幫助使代碼運行。在Python循環中更新和附加

# -*- coding: utf-8 -*- 
""" 
Created on Mon Oct 21 13:53:40 2013 

@author: Engin 
""" 


from Pycluster import * 
import numpy as np 


#Open the text file containing the stored smart meter data 
d=np.loadtxt("120-RES-195-Normalized.txt", delimiter="\t", skiprows=1, usecols=range(1,49)) 


handle=open("120-RES-195-Normalized.txt") 
record = read(handle) #Store the smart meter data in an array called record. 

cluster_results = np.ones((120, 11)) 
cluster_centroids=np.array([]) 
within_cluster_sum_of_squares=np.ones((1,11)) 
between_cluster_sum_of_squares=np.ones((1,11)) 
distance=[] 

for n in range (1,11): 
    cluster_results[:,n-1], within_cluster_sum_of_squares[:,n-1], optimal_solution_repetition = record.kcluster(nclusters=n, npass=10, method='a', dist='e')  #Performs the K-Means clustering using the defined parameters 
    centroids, cmask = record.clustercentroids(cluster_results[:,n-1], method='a', transpose=0) #Calculates the cluster centroids 
    cluster_centroids=np.append(cluster_centroids,centroids) 

#The following routine stores the cluster numbers and the indices of the elements belonging to each 
#cluster so that the Between Clusters Sum of Squares would be easily calculated. The results will also 
#be easily visualised. 
    from collections import defaultdict 
    cluster_numbers_members = defaultdict(list) 
    for i,item in enumerate(cluster_results[:,n-1]): 
     cluster_numbers_members[item].append(i) 
    cluster_numbers_members = {k:v for k,v in cluster_numbers_members.items() if len(v)>=1} 
    cluster_members=cluster_numbers_members.values() 
    cluster_numbers=cluster_numbers_members.keys() 

    distance[:,n-1]=0 
    between_cluster_sum_of_squares[:,n-1]=0 
    for i in range(0,n): 
     for k in range(0,n): 
      distance[:,n-1] = record.clusterdistance(index1=cluster_members[i], index2=cluster_members[k], method='a', dist='e', transpose=0) 
      between_cluster_sum_of_squares[:,n-1]=between_cluster_sum_of_squares[:,n-1]+distance[:,n-1] 

    WCBCR = within_cluster_sum_of_squares/between_cluster_sum_of_squares 
    print cluster_results[:,n-1] 
    print within_cluster_sum_of_squares[:,n-1] 

print cluster_centroids 

#Arranging cluster centroids in (1X48) vector form 
cluster_tuple=zip(*[iter(cluster_centroids)]*48) 
cluster_array=numpy.array(list(cluster_tuple)) 
+0

_ 「有啓動的問題,因爲正如我所說,我完全新的Python的。」 _請提供更多的細節。什麼樣的問題?你有錯誤信息嗎? – Kevin

+0

嗨@Kevin,我更新了代碼,因爲我在變量名中有一些錯誤。在早期版本的代碼中,我使用了一些其他變量名稱,但必須重新命名它們才能使代碼更加清晰和一致。當我試圖運行當前(更新)代碼時,我不斷收到以下錯誤消息:distance [:,n-1] = 0 TypeError:列表索引必須是整數,而不是元組。在此先感謝您的幫助。 – user2470127

回答

0

更換

[:,n-1] 

[:n-1] or [:(n-1)] # same thing, use whatever you find easier to read 
+0

嗨@ExperimentsWithCode,我試過,但不斷收到以下錯誤代碼:ValueError:無法從形狀(120)廣播輸入數組形狀(0,11) – user2470127

+0

@ user2470127它是否給你一個行號?我沒有看到你從哪裏得到'形狀'。另外,如果您能夠以正確的輸入格式提供一些示例數據,我可以測試一些更改。 – ExperimentsWithCode

+0

@ user2470127好的,我相信這些形狀是矩陣。我相信你可能會遇到的錯誤可能是由於第二個矩陣沒有實際的內容。形狀矩陣(0,11)是'[]',當你嘗試在矩陣之間進行計算時會出錯。我無法在(120)和(0,11)兩個矩陣之間進行任何數學計算,而沒有得到以下錯誤:'ValueError:操作數不能與形狀一起廣播(120)(0,11)'如果我將第二個矩陣(1,11)而不是(0,11)我能夠在它們之間執行操作。 – ExperimentsWithCode