我想要使用緯度/經度作爲X/Y軸和DaysUntilDueDate作爲我的Z軸羣集數據。我還想保留索引列('PM'),以便以後可以使用此聚類分析創建計劃。我發現here的教程非常棒,但我不知道它是否考慮了Z軸,而且我的四周沒有導致任何錯誤。我想在代碼中重要的一點是iloc
位此行的參數:三維sklearn K-means聚類
kmeans_model = KMeans(n_clusters=k, random_state=1).fit(A.iloc[:, :])
我試圖改變這部分iloc[1:4]
(對列1-3只工作),但造成了下面的錯誤:
ValueError: n_samples=3 should be >= n_clusters=4
所以我的問題是:如何建立我的代碼,以3維運行聚類分析,同時保留指數(「PM」)列?
這裏是我的Python文件,感謝您的幫助:
from sklearn.cluster import KMeans
import csv
import pandas as pd
# Import csv file with data in following columns:
# [PM (index)] [Longitude] [Latitude] [DaysUntilDueDate]
df = pd.read_csv('point_data_test.csv',index_col=['PM'])
numProjects = len(df)
K = numProjects // 3 # Around three projects can be worked per day
print("Number of projects: ", numProjects)
print("K-clusters: ", K)
for k in range(1, K):
# Create a kmeans model on our data, using k clusters.
# Random_state helps ensure that the algorithm returns the
# same results each time.
kmeans_model = KMeans(n_clusters=k, random_state=1).fit(df.iloc[:, :])
# These are our fitted labels for clusters --
# the first cluster has label 0, and the second has label 1.
labels = kmeans_model.labels_
# Sum of distances of samples to their closest cluster center
SSE = kmeans_model.inertia_
print("k:",k, " SSE:", SSE)
# Add labels to df
df['Labels'] = labels
#print(df)
df.to_csv('test_KMeans_out.csv')