三維sklearn K-means聚類

我想要使用緯度/經度作爲X/Y軸和DaysUntilDueDate作爲我的Z軸羣集數據。我還想保留索引列（'PM'），以便以後可以使用此聚類分析創建計劃。我發現here的教程非常棒，但我不知道它是否考慮了Z軸，而且我的四周沒有導致任何錯誤。我想在代碼中重要的一點是iloc位此行的參數：三維sklearn K-means聚類

kmeans_model = KMeans(n_clusters=k, random_state=1).fit(A.iloc[:, :])

我試圖改變這部分iloc[1:4]（對列1-3只工作），但造成了下面的錯誤：

ValueError: n_samples=3 should be >= n_clusters=4

所以我的問題是：如何建立我的代碼，以3維運行聚類分析，同時保留指數（「PM」）列？

這裏是我的Python文件，感謝您的幫助：

from sklearn.cluster import KMeans 
import csv 
import pandas as pd 

# Import csv file with data in following columns: 
# [PM (index)] [Longitude] [Latitude] [DaysUntilDueDate] 

df = pd.read_csv('point_data_test.csv',index_col=['PM']) 

numProjects = len(df) 
K = numProjects // 3 # Around three projects can be worked per day 


print("Number of projects: ", numProjects) 
print("K-clusters: ", K) 

for k in range(1, K): 
    # Create a kmeans model on our data, using k clusters. 
    # Random_state helps ensure that the algorithm returns the 
    # same results each time. 
    kmeans_model = KMeans(n_clusters=k, random_state=1).fit(df.iloc[:, :]) 

    # These are our fitted labels for clusters -- 
    # the first cluster has label 0, and the second has label 1. 
    labels = kmeans_model.labels_ 

    # Sum of distances of samples to their closest cluster center 
    SSE = kmeans_model.inertia_ 

print("k:",k, " SSE:", SSE) 

# Add labels to df 
df['Labels'] = labels 
#print(df) 

df.to_csv('test_KMeans_out.csv')

來源

2017-06-27 P Gresh

看來這個問題是與iloc[1:4]語法。

從你的問題看來你改變：

kmeans_model = KMeans(n_clusters=k, random_state=1).fit(df.iloc[:, :])

到：

kmeans_model = KMeans(n_clusters=k, random_state=1).fit(df.iloc[1:4])

在我看來，要麼你有一個錯字或你不明白ILOC是如何工作的。所以我會解釋一下。

您應該先閱讀熊貓文檔中的索引和選擇數據。

但總之.iloc是一種基於整數的索引方法，用於按位置選擇數據。

比方說，你有數據幀：

您所提供iloc[:,:]的例子使用ILOC的選擇所有的行和列，併產生了整個數據幀。如果您不熟悉Python的切片符號，請查看Explain slice notation或An Informal Introduction to Python的文檔。你說的例子導致你的錯誤iloc[1:4]選擇索引1-3的行。這將導致：

現在，如果你認爲你正在嘗試做的，您收到的錯誤，你會發現你已經選擇較少的樣本形成數據比你正在尋找集羣。 3個樣本（第1,2，3行），但是您要求KMeans找到4個羣集，這是不可能的。

你真正打算做的事情（據我所知）是選擇所有與你的lat，lng和z值對應的行和列1-3。要做到這一點只需添加一個冒號作爲第一個參數ILOC像這樣：

df.iloc[:, 1:4]

現在你選擇了所有的樣品和列的索引1，2和3。現在，假設你有足夠的樣品，KMeans應該按照你的意圖工作。

來源

2017-06-27 16:30:55 Grr

三維sklearn K-means聚類

回答

相關問題