2016-01-26 29 views
28

我正在創建句子的單詞表示形式。然後將句子中存在的單詞與文件「vectors.txt」進行比較,以獲得它們的嵌入向量。在爲句子中存在的每個單詞獲取矢量後,我將平均處理句子中單詞的矢量。這是我的代碼:TypeError:ufunc'add'不包含具有簽名匹配類型的循環

import nltk 
import numpy as np 
from nltk import FreqDist 
from nltk.corpus import brown 


news = brown.words(categories='news') 
news_sents = brown.sents(categories='news') 

fdist = FreqDist(w.lower() for w in news) 
vocabulary = [word for word, _ in fdist.most_common(10)] 
num_sents = len(news_sents) 

def averageEmbeddings(sentenceTokens, embeddingLookupTable): 
    listOfEmb=[] 
    for token in sentenceTokens: 
     embedding = embeddingLookupTable[token] 
     listOfEmb.append(embedding) 

return sum(np.asarray(listOfEmb))/float(len(listOfEmb)) 

embeddingVectors = {} 

with open("D:\\Embedding\\vectors.txt") as file: 
    for line in file: 
     (key, *val) = line.split() 
     embeddingVectors[key] = val 

for i in range(num_sents): 
    features = {} 
    for word in vocabulary: 
     features[word] = int(word in news_sents[i])   
    print(features) 
    print(list(features.values())) 
sentenceTokens = [] 
for key, value in features.items(): 
    if value == 1: 
     sentenceTokens.append(key) 
sentenceTokens.remove(".")  
print(sentenceTokens)   
print(averageEmbeddings(sentenceTokens, embeddingVectors)) 

print(features.keys()) 

不知道爲什麼,但我得到這個錯誤:

TypeError         Traceback (most recent call last) 
<ipython-input-4-643ccd012438> in <module>() 
39  sentenceTokens.remove(".") 
40  print(sentenceTokens) 
---> 41  print(averageEmbeddings(sentenceTokens, embeddingVectors)) 
42 
43 print(features.keys()) 

<ipython-input-4-643ccd012438> in averageEmbeddings(sentenceTokens, embeddingLookupTable) 
18   listOfEmb.append(embedding) 
19 
---> 20  return sum(np.asarray(listOfEmb))/float(len(listOfEmb)) 
21 
22 embeddingVectors = {} 

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U9') dtype('<U9') dtype('<U9') 

附:嵌入矢量的樣子:

the 0.011384 0.010512 -0.008450 -0.007628 0.000360 -0.010121 0.004674 -0.000076 
of 0.002954 0.004546 0.005513 -0.004026 0.002296 -0.016979 -0.011469 -0.009159 
and 0.004691 -0.012989 -0.003122 0.004786 -0.002907 0.000526 -0.006146 -0.003058 
one 0.014722 -0.000810 0.003737 -0.001110 -0.011229 0.001577 -0.007403 -0.005355 
in -0.001046 -0.008302 0.010973 0.009608 0.009494 -0.008253 0.001744 0.003263 

使用後np.sum我得到這個錯誤:

TypeError         Traceback (most recent call last) 
<ipython-input-13-8a7edbb9d946> in <module>() 
40  sentenceTokens.remove(".") 
41  print(sentenceTokens) 
---> 42  print(averageEmbeddings(sentenceTokens, embeddingVectors)) 
43 
44 print(features.keys()) 

<ipython-input-13-8a7edbb9d946> in averageEmbeddings(sentenceTokens, embeddingLookupTable) 
18   listOfEmb.append(embedding) 
19 
---> 20  return np.sum(np.asarray(listOfEmb))/float(len(listOfEmb)) 
21 
22 embeddingVectors = {} 

C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in sum(a, axis, dtype, out, keepdims) 
    1829  else: 
    1830   return _methods._sum(a, axis=axis, dtype=dtype, 
-> 1831        out=out, keepdims=keepdims) 
    1832 
    1833 

C:\Anaconda3\lib\site-packages\numpy\core\_methods.py in _sum(a, axis, dtype, out, keepdims) 
30 
31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False): 
---> 32  return umr_sum(a, axis, dtype, out, keepdims) 
33 
34 def _prod(a, axis=None, dtype=None, out=None, keepdims=False): 

TypeError: cannot perform reduce with flexible type 
+0

請嘗試使用'np.sum'。 – JBernardo

+0

現在我在_sum(a,axis,dtype,out,keepdims)中引用「C:\ Anaconda3 \ lib \ site-packages \ numpy \ core \ _methods.py」得到「typeError:無法用靈活類型執行縮減」 – Masyaf

回答

28

你有一個字符串,而不是浮動的numpy的陣列。這是什麼意思dtype('<U9') - 一個小端編碼unicode字符串,最多9個字符。

嘗試:

return sum(np.asarray(listOfEmb, dtype=float))/float(len(listOfEmb)) 

不過,你不需要numpy的位置都沒有。你真的可以只是做:

return sum(float(embedding) for embedding in listOfEmb)/len(listOfEmb) 

或者,如果你真的設置使用numpy的。

return np.asarray(listOfEmb, dtype=float).mean() 
+1

謝謝,我認爲它的工作。至少它沒有給出任何錯誤,不確定計算是否完成。 – Masyaf

+0

也可以使用map(float,listofEmb) – jimh

相關問題