2016-11-29 119 views
1

我是python的初學者,我試圖繪製集羣的中心,但不能這樣做。這裏是我的代碼:在Python中繪製集羣的質心

import pandas as pd 
import numpy as np 

df = pd.read_csv("InputClusterModel.txt") 
df.columns = ["Major","Quantity","rating","rating_2","RightWindoWeek","Ranking","CopiesQuant","Content","Trump","Movies","Carton","Serial","Before1014","categor","Purchase","Revenue"] 
df.head() 

from sklearn.cluster import KMeans 

cluster = KMeans(n_clusters=2) 

df['cluster'] = cluster.fit_predict(df[df.columns[:15]]) 

from sklearn.decomposition import PCA 
x_cols = df.columns[1:] 

pca = PCA() 
df['x'] = pca.fit_transform(df[x_cols])[:,0] 

df['y'] = pca.fit_transform(df[x_cols])[:,1] 

df = df.reset_index() 

clusters = df[['Purchase', 'cluster', 'x', 'y']] 

clusters.head() 

%matplotlib inline 
from ggplot import * 

ggplot(df, aes(x='x', y='y', color='cluster')) + \ 
    geom_point(size=75) + \ 
    ggtitle("Grouped by Cluster") 

df.cluster.value_counts() 
#after part which below I see mistake: 

cluster_centers = pca.transform(cluster.cluster_centers_) 
cluster_centers = pd.DataFrame(cluster_centers, columns=['x', 'y']) 
cluster_centers['cluster'] = range(0, len(cluster_centers)) 

ggplot(cluster, aes(x='x', y='y', color='cluster')) + \ 
    geom_point(size=100) + \ 
    geom_point(cluster_centers, size=500) +\ 
    ggtitle("Customers Grouped by Cluster") 
print(pca.explained_variance_ratio_) 

這是錯誤我得到:

ValueError        Traceback (most recent call 
last) <ipython-input-18-c2ac22e32b75> in <module>() 
----> 1 cluster_centers = pca.transform(cluster.cluster_centers_) 
     2 cluster_centers = pd.DataFrame(cluster_centers, columns=['x', 'y']) 
     3 cluster_centers['cluster'] = range(0, len(cluster_centers)) 
     4 
     5 ggplot(cluster, aes(x='x', y='y', color='cluster')) +  geom_point(size=100) +  geom_point(cluster_centers, size=500) + 
ggtitle("Customers Grouped by Cluster") 

/home/belotelov/anaconda2/lib/python2.7/site-packages/sklearn/decomposition/base.pyc 
in transform(self, X, y) 
    130   X = check_array(X) 
    131   if self.mean_ is not None: 
--> 132    X = X - self.mean_ 
    133   X_transformed = fast_dot(X, self.components_.T) 
    134   if self.whiten: 

ValueError: operands could not be broadcast together with shapes 
(2,15) (16,) 

我的數據結構,看起來這個頭:

0,122,7,8,6,8,105.704,1,0,1,0,0,0,0,37426,11831762 1,278,8,8,12,2,2246,1,1,1,0,0,0,0,29316,7371029 1,275,6,6,14,1,1268,1,1,1,0,0,0,0,30693,7368787 0,125,5,5,5,1,105.704,1,0,1,0,0,0,0,20661,7337545 1,193,8,8,11,2,1063,1,1,1,0,0,0,0,29141,7279077 1,1,6,6,11,0,1236,1,1,0,1,0,0,0,879,325151 1,116,8,8,14,0,1209,1,1,0,1,0,0,0,17751,5529657 0,39,7,7,11,1,1128,1,1,1,0,0,0,0,15044,5643468 1,65,6,6,11,0,1209,1,1,0,1,0,0,0,9902,2612669 0,170,6,7,2,0,105.704,1,1,1,0,0,0,0,19167,5195321

附: Python 2.7.12 :: Debian Jessie上的Anaconda自定義(64位)

回答

0

我還沒有逐行回顧你的代碼。下面是有關該錯誤的評論:

ValueError: operands could not be broadcast together with shapes (2,15) (16,)

由於錯誤意味着,你想有兩個不兼容的載體播出X = X - self.mean_。廣播的規則是每個向量的最後一個維度的軸長應匹配(在這裏15和1)或兩者應爲1

我建議您搜索生成的錯誤和對this

看看