這是一個使用KMeans的例子。
from sklearn.datasets import make_blobs
from itertools import product
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
# try to simulate your data
# =====================================================
X, y = make_blobs(n_samples=1000, n_features=10, centers=3)
columns = ['feature' + str(x) for x in np.arange(1, 11, 1)]
d = {key: values for key, values in zip(columns, X.T)}
d['label'] = y
data = pd.DataFrame(d)
Out[72]:
feature1 feature10 feature2 ... feature8 feature9 label
0 1.2324 -2.6588 -7.2679 ... 5.4166 8.9043 2
1 0.3569 -1.6880 -5.7671 ... -2.2465 -1.7048 0
2 1.0177 -1.7145 -5.8591 ... -0.5755 -0.6969 0
3 1.5735 -0.0597 -4.9009 ... 0.3235 -0.2400 0
4 -0.1042 -1.6703 -4.0541 ... 0.4456 -1.0406 0
.. ... ... ... ... ... ... ...
995 -0.0983 -1.4569 -3.5179 ... -0.3164 -0.6685 0
996 1.3151 -3.3253 -7.0984 ... 3.7563 8.4052 2
997 -0.9177 0.7446 -4.8527 ... -2.3793 -0.4038 0
998 2.0385 -3.9001 -7.7472 ... 5.2290 9.2281 2
999 3.9357 -7.2564 5.7881 ... 1.2288 -2.2305 1
[1000 rows x 11 columns]
# fit your data with KMeans
# =====================================================
kmeans = KMeans(n_clusters=3)
kmeans.fit_predict(data.ix[:, :-1].values)
Out[70]: array([1, 0, 0, ..., 0, 1, 2], dtype=int32)
你不能只選擇第一列並通過這個?像kmeans(df。[df.columns [0]],3)' – EdChum
我想運行使用除第一列之外的所有列的kmeans(因爲第一列填滿了字符串)。 – user1566200
well'kmeans(df。[df.columns [1:]],3)'那是 – EdChum