如何用這些函數創建字典的詞典？

我有一個字典是這樣的：如何用這些函數創建字典的詞典？

dict = {in : [0.01, -0.07, 0.09, -0.02], and : [0.2, 0.3, 0.5, 0.6], to : [0.87, 0.98, 0.54, 0.4]}

欲計算，我有一個餘弦相似度函數，它接受兩個向量每個單詞之間的餘弦相似性。首先，它將爲'in'和'and'帶來價值，然後它應該爲'in'和'to'等等帶來價值。

我希望它將結果存儲在另一個字典中，其中'in'應該是關鍵字，值應該是每個計算的餘弦相似度值與該關鍵字的字典。就像我所要的輸出是這樣的：

{in : {and : 0.4321, to : 0.218}, and : {in : 0.1245, to : 0.9876}, to : { in : 0.8764, and : 0.123}}

下面是這是做這一切的代碼：

def cosine_similarity(vec1,vec2): 
    sum11, sum12, sum22 = 0, 0, 0 
    for i in range(len(vec1)): 
     x = vec1[i]; y = vec2[i] 
     sum11 += x*x 
     sum22 += y*y 
     sum12 += x*y 
    return sum12/math.sqrt(sum11*sum22) 

def resultInDict(result,name,value,keyC): 
    new_dict={} 
    new_dict[keyC]=value  
    if name in result: 
     result[name] = new_dict 
    else: 
     result[name] = new_dict 

def extract(): 
    result={} 
    res={} 
    with open('file.txt') as text: 
     for line in text: 
      record = line.split() 
      key = record[0] 
      values = [float(value) for value in record[1:]] 
      res[key] = values 
    for key,value in res.iteritems(): 
      temp = 0 
      for keyC,valueC in res.iteritems(): 

       if keyC == key: 
        continue 
       temp = cosine_similarity(value,valueC) 
       resultInDict(result,key,temp,keyC) 
    print result

但是，它給的結果是這樣的：

{'and': {'in': 0.12241083209661485}, 'to': {'in': -0.0654517869126785}, 'from': {'in': -0.5324142931780856}, 'in': {'from': -0.5324142931780856}}

我希望它是這樣的：

{in : {and : 0.4321, to : 0.218}, and : {in : 0.1245, to : 0.9876}, to : { in : 0.8764, and : 0.123}}

我覺得這是因爲在resultInDict函數中，我定義了一個新字典new_dict來爲內部字典添加鍵值，但每次調用resultInDict函數時，都會清空該行上的new_dict new_dict={}，並且只添加一個鍵值對。

我該如何解決這個問題？

來源

2014-11-04 Martha Pears

不是很優雅，但它的工作：

import math 

def cosine_similarity(vec1,vec2): 
    sum11, sum12, sum22 = 0, 0, 0 
    for i in range(len(vec1)): 
     x = vec1[i]; y = vec2[i] 
     sum11 += x*x 
     sum22 += y*y 
     sum12 += x*y 
    return sum12/math.sqrt(sum11*sum22) 

mydict = {"in" : [0.01, -0.07, 0.09, -0.02], "and" : [0.2, 0.3, 0.5, 0.6], "to" : [0.87, 0.98, 0.54, 0.4]} 
mydict_keys = mydict.keys() 

result = {} 
for k1 in mydict_keys: 
    temp_dict = {} 
    for k2 in mydict_keys: 
     if k1 != k2: 
     temp_dict[k2] = cosine_similarity(mydict[k1], mydict[k2]) 
    result[k1] = temp_dict

另外，如果你有大的數據結構，考慮使用scipy（http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.cosine.html）或scikit-learn（http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html），用於在更計算餘弦相似高效的方式（後者不僅速度快，而且記憶友好，因爲你可以餵它一個scipy.sparse矩陣）。

來源

2014-11-04 21:22:28 Mikk

如何用這些函數創建字典的詞典？

回答

相關問題