2017-01-02 138 views
1

我想使用gensim word2vec作爲神經網絡的輸入。我有2個問題:什麼是Gensim word2vec輸出

1)gensim.models.Word2Vec獲取參數的大小。如何使用此參數?和什麼大小?

2)一旦訓練過什麼是gensim word2vec的輸出?正如我所看到的,這不是一個概率值(不在0和1之間)。在我看來,對於每個單詞向量,我們在這個單詞和其他單詞之間得到一個距離(餘弦)(但是究竟哪個單詞?)

感謝您的回覆。

回答

2

答案爲1 - >的尺寸參數是所述字矢量即每個矢量將具有100點的尺寸,如果size=100

答到2的尺寸 - >可以使用該功能save_word2vec_format(fname="vectors.txt", fvocab=None, binary=False)ref保存字向量。這將保存一個文件"vectors.txt",其中第一行爲<size of the vocabulary> <dimensions>,其餘行的格式爲<word> <vector of size dimension>

"vectors.txt"樣品:

297820 100 
the -0.18542234751 0.138813291635 0.0392148854213 0.0238721499736 -0.0443151295365 0.03226302388 -0.168626211895 -0.17397777844 -0.0547546409461 0.166621666046 0.0534506882806 0.0774947957067 -0.180520283779 -0.0938140452702 -0.0354599008902 -0.0533488133527 -0.0667684564816 -0.0210904306995 -0.103069115604 -0.138712344952 -0.035142440978 -0.125067138202 0.0514192233164 -0.142052171747 0.0795726729387 0.0310433094806 -0.00666224898992 0.047268806263 0.0339849190176 -0.181107631029 0.0477396587205 0.0483130822899 -0.090229393762 0.0224528628225 0.190814060668 -0.179506639849 0.00034066604609 0.0639057478 0.156444383949 -0.0366888977431 -0.170674385275 -0.053907152935 0.106572313582 0.0724497821903 -0.00848717936216 0.124053494271 -0.0420715605081 0.0460277422205 -0.0514693485657 0.132215091575 -0.0429308836475 -0.111784875385 -0.0543172053216 0.0849476776796 -0.015301892652 0.00992711997251 -0.00566113637219 0.00136359242972 -0.0382116842516 0.0681229985191 0.0685156463052 0.0759072640845 -0.0238136705161 0.168710450161 0.00879930186352 -0.179756801973 -0.210286559709 -0.161832152064 -0.0212640125813 -0.0115905356526 -0.0949562511822 0.126493155131 0.0215821686774 -0.164276918273 -0.0573806470616 -0.0147266125919 0.0566350339785 -0.0276969849679 0.0178970346094 0.0599163813161 0.0919867942845 0.172071394538 0.0714226787026 0.109037733251 0.00403647493576 0.044853743905 -0.0915639785243 -0.0242494817113 0.0705554654776 0.255584701079 0.001309754199 0.0872413719572 -0.0376289782286 0.158184379871 0.109245196088 -0.0727554069742 0.168820215174 0.0454895919746 0.0741726055733 -0.134467710995 
... 
...