1
我想培養出KD-樹上的文檔語料庫的TFIDF但它給ValueError異常:設置一個數組元素與序列而訓練KD樹TFIDF
ValueError: setting an array element with a sequence.
代碼和錯誤描述如下。有人可以幫我找出問題嗎?
代碼:
t0 = time.time()
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
t1 = time.time()
total = t1-t0
print "TF-IDF built:", total
#######################------------------------############################
t0 = time.time()
#nbrs = NearestNeighbors(n_neighbors=20, algorithm='kd_tree', metric='euclidean')
#nbrs.fit(X_train_tfidf)#,Y)
nbrs = KDTree(np.array(X_train_tfidf), leaf_size=100)
t1 = time.time()
total = t1-t0
print "KNN Trained:", total
#######################------------------------############################
這是錯誤:
TF-IDF built: 0.108999967575
Traceback (most recent call last):
File ".\tfidf_knn.py", line 48, in <module>
nbrs = KDTree(np.array(X_train_tfidf), leaf_size=100)
File "sklearn/neighbors/binary_tree.pxi", line 1055, in sklearn.neighbors.kd_tree.BinaryTree.__init__ (sklearn\neighbo
rs\kd_tree.c:8298)
File "C:\Anaconda2\lib\site-packages\numpy\core\numeric.py", line 474, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
感謝您的幫助!然而,它處理的是小數據,但是當它給出一個巨大的數組時---我得到一個內存,因爲在我做了「toarray()」之後 - 矩陣不再是稀疏的了。 有沒有辦法給KDTree提供一個稀疏矩陣? – user3667569
嘿。看我的編輯。你不能在稀疏輸入中使用kd_tree,但你可以改變方法爲暴力。結果應該沒有那麼大的不同。您還需要將稀疏矩陣轉換爲與sklearn模型更兼容的另一種形式(csr_matrix)。 – kazAnova