我正在使用gensim Doc2Vec模型來生成我的特徵向量。這裏是我使用的代碼(我已經解釋了我的問題是在代碼是什麼):Gensim Doc2Vec模型只生成有限數量的向量
cores = multiprocessing.cpu_count()
# creating a list of tagged documents
training_docs = []
# all_docs: a list of 53 strings which are my documents and are very long (not just a couple of sentences)
for index, doc in enumerate(all_docs):
# 'doc' is in unicode format and I have already preprocessed it
training_docs.append(TaggedDocument(doc.split(), str(index+1)))
# at this point, I have 53 strings in my 'training_docs' list
model = Doc2Vec(training_docs, size=400, window=8, min_count=1, workers=cores)
# now that I print the vectors, I only have 10 vectors while I should have 53 vectors for the 53 documents that I have in my training_docs list.
print(len(model.docvecs))
# output: 10
我只是想知道或者如果我做了一個錯誤,如果有任何其他的參數,我應該設置?
更新:我是用標籤打參數TaggedDocument,當我改成了文字和數字的混合物等:文檔1,文檔2,...我看到生成的向量的數量不同的數字,但仍然沒有預期的特徵向量數量相同。