林試圖解決一個聚類problem..I具有由CountVectorizer產生()功能。本TFIDF加權向量的列表的數據類型:numpy的矩陣尺寸-TFIDF矢量
<1000x5369 sparse matrix of type '<type 'numpy.float64'>'
with 42110 stored elements in Compressed Sparse Row format>
我具有下列尺寸的「質心」載體:
<1x5369 sparse matrix of type '<type 'numpy.float64'>'
with 57 stored elements in Compressed Sparse Row format>
當嘗試測量所述質心的餘弦相似性,並通過下面的行的代碼在我tfidf_vec_list其他載體:
for centroid in centroids:
sim_scores=[cosine_similarity(vector,centroid) for vector in tfidf_vec_list]
其中所述相似性函數是:
def cosine_similarity(vector1,vector2):
score=1-scipy.spatial.distance.cosine(vector1,vector2)
return score
我得到錯誤:
Traceback (most recent call last):
File "<pyshell#25>", line 1, in <module>
sim_scores=[cosine_similarity(vector,centroid) for vector in tfidf_vec_list]
File "/home/ashwin/Desktop/Python-2.7.9/programs/test_2.py", line 28, in cosine_similarity
score=1-scipy.spatial.distance.cosine(vector1,vector2)
File "/usr/lib/python2.7/dist-packages/scipy/spatial/distance.py", line 287, in cosine
dist = 1.0 - np.dot(u, v)/(norm(u) * norm(v))
File "/usr/lib/python2.7/dist-packages/scipy/sparse/base.py", line 302, in __mul__
raise ValueError(**'dimension mismatch'**)
我已經試過各種包括矩陣中的每個向量轉換成一個陣列和一個list.But我得到同樣的錯誤!
看起來像矢量和質心有不同的尺寸,所以檢查這兩個向量長度 – 2014-12-06 22:45:25
@ Michael Plakhov不 - 他們有相同的尺寸:1 * 5369這是我無法理解的 – 2014-12-06 23:49:49
這種向量中的什麼樣的元素?我是指典型的大小? – 2014-12-07 09:38:20