2016-12-25 67 views
2

我試圖用word2vec創建兩個單詞之間的相似性,我成功了,同時手動執行。但我有兩個大的txt文件。我想創建一個循環。我嘗試了一些循環方法,但是我沒有成功。所以我決定問問專家。如何在循環中使用帶有兩個輸入的Word2Vec?

我的代碼:

import gensim 

model = gensim.models.Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True) 
with open('myfile1.txt', 'r') as f: 
    data1 = f.readlines() 

with open('myfile2.txt', 'r') as f: 
    data2 = f.readlines() 

data = zip(data1, data2) 

with open('myoutput.txt', 'a') as f: 
    for x in data: 
     output = model.similarity(x[1], x[0]) # reading each word form each files 
     out = '{} : {} : {}\n'.format(x[0].strip(), x[1].strip(),output) 
     f.write(out) 

我輸入1,(文本)

street 
spain 
ice 
man 

我輸入2(文本2)

florist 
paris 
cold 
kid 

我想這個輸出(output.txt的)

street florist 0.19991447551502498 
spain paris 0.5380033328157873 
ice cold 0.40968857572410483 
man kid 0.42953233870042506 
+0

請修復縮進和錯誤。 –

+0

我已檢查您的代碼,它正在工作!你面臨的問題是什麼?你有什麼錯誤嗎? –

+0

我得到這個錯誤:文件「testing1.py」,第14行,在 output = model.similarity(x [1],x [0])#讀取每個文件形成的每個文件 文件「/ anaconda2/lib/python2.7/site-packages/gensim-0.13.3-py2.7-linux-x86_64.egg/gensim/models/word2vec.py「,行1598,相似 return dot(matutils.unitvec(self [w1] ),matutils.unitvec(self [w2])) 文件「anaconda2/lib/python2.7/site-packages/gensim-0.13.3-py2.7-linux-x86_64.egg/gensim/models/word2vec.py 「,第1578行,在__getitem__ return self.syn0 [self.vocab [words] .index] KeyError:'street \ n' –

回答

0
import gensim 

model = gensim.models.Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True) 

file1 = [] 
file2 = [] 

with open('myfile1.txt','rU') as f: 
for line in f: 
file1.append(line.rstrip()) 

with open('myfile2.txt','rU') as f1: 
for line1 in f1: 
file2.append(line1.rstrip()) 

resutl=[] 
f=open('Output2.txt', "w") 
for i in file1 : 
for g in file2 : 
     temp=[] 
     temp.append(i) 
     temp.append(g) 
     w = model.similarity(i,g) 
     temp.append(w) 
     result=i+','+g+','+str(w) 

     f.write(result) 
     f.write('\n') 

     f.close() 

你有循環問題,兩個循環應該在一起。