0
我已經編寫了以下代碼來計算大量預處理文檔(停用詞去除,詞幹和詞頻 - 逆文檔頻率)之間的餘弦相似度。使用Python計算餘弦相似度
print(X.shape)
similarity = []
for each in X:
similarity.append(cosine_similarity(X[i:1], X))
print(cosine_similarity(X[i:1], X))
i = i+1
然而,當我運行它,我收到此:
(2235, 7791)
[[ 1. 0.01490594 0.11752643 ..., 0.00941571 0.03652551
0.]]
Traceback (most recent call last):
File "...", line 83, in <module>
similarity.append(cosine_similarity(X[i:1], X))
File "/Users/.../anaconda/lib/python3.5/site-packages/sklearn/metrics/pairwise.py", line 881, in cosine_similarity
X, Y = check_pairwise_arrays(X, Y)
File "/Users/.../anaconda/lib/python3.5/site-packages/sklearn/metrics/pairwise.py", line 96, in check_pairwise_arrays
X = check_array(X, accept_sparse='csr', dtype=dtype)
File "/Users/.../anaconda/lib/python3.5/site-packages/sklearn/utils/validation.py", line 407, in check_array
context))
ValueError: Found array with 0 sample(s) (shape=(0, 7791)) while a minimum of 1 is required.
[Finished in 56.466s]
您正在循環中使用X [i:1]。當我達到1時,您正在訪問返回空列表的X [1:1]。這是導致錯誤。 –
@DileepKumarPatchigolla我該怎麼做呢? – user7347576
我對cosine_similarity不熟悉。你能提供X的樣子,所以我可以試試嗎? –