2017-02-21 69 views
3

來自gensim的Word2Vec 0.13.4.1更新在飛行中的單詞矢量不起作用。gensim word2vec - 用在線文字嵌入更新數組的尺寸

model.build_vocab(sentences, update=False) 

工作正常;然而,

model.build_vocab(sentences, update=True) 

沒有。


我使用this website來嘗試和模擬他們所做的事情;因此我使用下面的腳本在某個時刻:

model = gensim.models.Word2Vec() 
sentences = gensim.models.word2vec.LineSentence("./text8/text8") 
model.build_vocab(sentences, keep_raw_vocab=False, trim_rule=None, progress_per=10000, update=False) 
model.train(sentences) 

然而儘管這與運行update=False,使用update=True給了我下面的回溯:

Traceback (most recent call last): 
    File "word2vecAttempt.py", line 34, in <module> 
    model.build_vocab(sentences, progress_per=10000, update=True) 
    File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 535, in build_vocab 
    self.finalize_vocab(update=update) # build tables & arrays 
    File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 708, in finalize_vocab 
    self.update_weights() 
    File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 1070, in update_weights 
    self.wv.syn0 = vstack([self.wv.syn0, newsyn0]) 
    File "/home/brownc/anaconda3/lib/python3.5/site-packages/numpy/core/shape_base.py", line 230, in vstack 
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0) 
ValueError: all the input array dimensions except for the concatenation axis must match exactly 

回答

5

我能夠重現你的錯誤。我認爲當模特尚未接受培訓時你打電話給update=True。你只能在預先訓練過的時候打電話給你。

這工作:

import gensim 

model = gensim.models.Word2Vec() 
sentences = gensim.models.word2vec.LineSentence("text8") 
model.build_vocab(sentences, update=False) 
model.train(sentences) 

model.build_vocab(sentences, update=True) 
model.train(sentences) 

但是,這將失敗:

import gensim 

model = gensim.models.Word2Vec() 
sentences = gensim.models.word2vec.LineSentence("text8") 
model.build_vocab(sentences, update=True) 
model.train(sentences) 

ValueError: all the input array dimensions except for the concatenation axis must match exactly 

使用最新版本的gensim 0.13.4.1的。

+2

謝謝!我認爲它的要點是製作一個動態模型,在您將非訓練數據輸入到該模型中時進行更新。 – chase