2016-03-31 33 views
0

我在Python和gensim中是全新的。我試圖在windows7(64)上使用Python 3.4中的gensim中的word2vec。在Python中執行Word2Vec時出錯

import csv 
with open('Data.csv', 'r') as csvfile: 
Word2VecTextTrain = csv.reader(csvfile, delimiter=' ') 
    from gensim.models import Word2Vec 
    model = Word2Vec(Word2VecTextTrain, size=100, window=3, min_count=5, workers=4) 

「Data.csv」包含30k行文本。這些文本可以是完整或不完整的句子,包括最多20個單詞。其中一些可能包含「/」或數字。

我面對這個錯誤:

Traceback (most recent call last): 
    File "C:/Users/Home/PycharmProjects/Word2Vec Project/Word2Vec_2016_03_23", line 26, in <module> 
    model = Word2Vec(Word2VecTextTrain, size=100, window=5, min_count=5, workers=4) 
    File "C:\Users\Home\Miniconda3\lib\site-packages\gensim\models\word2vec.py", line 431, in __init__ 
    self.build_vocab(sentences, trim_rule=trim_rule) 
    File "C:\Users\Home\Miniconda3\lib\site-packages\gensim\models\word2vec.py", line 497, in build_vocab 
    self.finalize_vocab() # build tables & arrays 
    File "C:\Users\Home\Miniconda3\lib\site-packages\gensim\models\word2vec.py", line 625, in finalize_vocab 
    self.reset_weights() 
    File "C:\Users\Home\Miniconda3\lib\site-packages\gensim\models\word2vec.py", line 932, in reset_weights 
    self.syn0[i] = self.seeded_vector(self.index2word[i] + str(self.seed)) 
    File "C:\Users\Home\Miniconda3\lib\site-packages\gensim\models\word2vec.py", line 946, in seeded_vector 
    once = random.RandomState(uint32(self.hashfxn(seed_string))) 
OverflowError: Python int too large to convert to C long 

Process finished with exit code 1 

我不知道這個錯誤的原因。任何幫助是真正的讚賞。

回答

1

我無法重現我的Ubuntu機器上的錯誤,但是LineSentence可能更適合你:

from gensim.models import Word2Vec 
from gensim.models.word2vec import LineSentence 

Word2VecTextTrain = LineSentence('Data.csv') 
model = Word2Vec(Word2VecTextTrain, size=100, window=3, min_count=5, workers=4) 
+0

感謝您的建議。不幸的是,它給了我同樣的錯誤。 – user3439050

+0

你能分享輸入文件嗎? – kampta