您需要通過字典LABEL_COLOR_MAP
添加一個新的顏色4
和使用maping
:
LABEL_COLOR_MAP = {0 : 'red',
1 : 'blue',
2 : 'green',
3 : 'purple',
4 : 'yellow'}
existing_df_2d.plot(
kind='scatter',
x='PC2',y='PC1',
c=existing_df_2d.cluster.map(LABEL_COLOR_MAP),
figsize=(16,8))
因爲:
print np.unique(existing_df_2d.cluster)
[0 1 2 3 4]
所有代碼:
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
tb_existing_url_csv = 'https://docs.google.com/spreadsheets/d/1X5Jp7Q8pTs3KLJ5JBWKhncVACGsg5v4xu6badNs4C7I/pub?gid=0&output=csv'
existing_df = pd.read_csv(
tb_existing_url_csv,
index_col = 0,
thousands = ',')
existing_df.index.names = ['country']
existing_df.columns.names = ['year']
pca = PCA(n_components=2)
pca.fit(existing_df)
PCA(copy=True, n_components=2, whiten=False)
existing_2d = pca.transform(existing_df)
existing_df_2d = pd.DataFrame(existing_2d)
existing_df_2d.index = existing_df.index
existing_df_2d.columns = ['PC1','PC2']
existing_df_2d.head()
kmeans = KMeans(n_clusters=5)
clusters = kmeans.fit(existing_df)
existing_df_2d['cluster'] = pd.Series(clusters.labels_, index=existing_df_2d.index)
print existing_df_2d.head()
PC1 PC2 cluster
country
Afghanistan -732.215864 203.381494 2
Albania 613.296510 4.715978 3
Algeria 569.303713 -36.837051 3
American Samoa 717.082766 5.464696 3
Andorra 661.802241 11.037736 3
LABEL_COLOR_MAP = {0 : 'red',
1 : 'blue',
2 : 'green',
3 : 'purple',
4 : 'yellow'}
existing_df_2d.plot(
kind='scatter',
x='PC2',y='PC1',
c=existing_df_2d.cluster.map(LABEL_COLOR_MAP),
figsize=(16,8))
測試:
前10行通過PC2
柱:
print existing_df_2d.loc[existing_df_2d['PC2'].nlargest(10).index,:]
PC1 PC2 cluster
country
Kiribati -2234.809790 864.494075 2
Djibouti -3798.447446 578.975277 4
Bhutan -1742.709249 569.448954 2
Solomon Islands -809.277671 530.292939 1
Nepal -986.570652 525.624757 1
Korea, Dem. Rep. -2146.623299 438.945977 2
Timor-Leste -1618.364795 428.244340 2
Tuvalu -1075.316806 366.666171 1
Mongolia -686.839037 363.722971 1
India -1146.809345 363.270389 1
http://stackoverflow.com/a/14779462/3838691 – MaxNoe
由於最大。但不知何故,我仍然無法解決問題。我在原始文章中添加了幾行代碼 – user27976