2013-11-21 52 views
3

某些模擬分數的範圍介於0和1之間,如最短路徑和WuP。因此,汽車與汽車之間的相似性將是1,但其他措施,如LCh中會基於WordNet的相似度的最大分數

lch(car, automobile) = 3.6889 

我想知道這些措施的最大比分。 3.6889是否被認爲是最大值?這是否意味着LCH得分介於0和3.6889之間。

我除了以下措施

jcn(car, automobile) = 12876699.5 
res(car, automobile) = 9.3679 
lesk(car, automobile) = 9519 
+0

也請看看[word2vec](http://radimrehurek.com/2013/09/deep-learning-with-word2vec-and-gensim/) - 可能很有趣! – arturomp

回答

2

似乎3.6375861597263857是lch_similarity最大(我不能得到3.6889 ...)。 lch_similarity,根據the documentation具有以下屬性:

Leacock Chodorow Similarity: 
     Return a score denoting how similar two word senses are, based on the 
     shortest path that connects the senses (as above) and the maximum depth 
     of the taxonomy in which the senses occur. The relationship is given as 
     -log(p/2d) where p is the shortest path length and d is the taxonomy 
     depth. 
... 
:return: A score denoting the similarity of the two ``Synset`` objects, 
      normally greater than 0. None is returned if no connecting path 
      could be found. If a ``Synset`` is compared with itself, the 
      maximum score is returned, which varies depending on the taxonomy 
      depth. 

鑑於rock_hind.n.01是在WordNet的分類程度最深(19),並且change.n.06處於最淺的級(2),我們可以與變化的深度實驗:

>>> from nltk.corpus import wordnet as wn 
>>> rock = wn.synset('rock_hind.n.01') 
>>> change = wn.synset('change.n.06') 
>>> rock.lch_similarity(rock) 
3.6375861597263857 
>>> change.lch_similarity(change) 
3.6375861597263857 
>>> change.lch_similarity(rock) 
0.7472144018302211 
>>> rock.lch_similarity(change) 
0.7472144018302211 

類似的實驗可以爲其他的措施,其中範圍似乎做出了不少大:

>>> from nltk.corpus import wordnet_ic, genesis 
>>> brown_ic = wordnet_ic.ic('ic-brown.dat') 
>>> semcor_ic = wordnet_ic.ic('ic-semcor.dat') 
>>> genesis_ic = wn.ic(genesis, False, 0.0) 
>>> rock.res_similarity(rock, brown_ic) # res_similarity, brown 
1e+300 
>>> rock.res_similarity(change, brown_ic) 
-0.0 
>>> rock.res_similarity(rock, semcor_ic) # res_similarity, semcor 
1e+300 
>>> rock.res_similarity(change, semcor_ic) 
-0.0 
>>> rock.res_similarity(rock, genesis_ic) # res_similarity, genesis 
1e+300 
>>> rock.res_similarity(change, genesis_ic) 
-0.08306855877006339 
>>> change.res_similarity(rock, genesis_ic) 
-0.08306855877006339 
>>> rock.jcn_similarity(rock, brown_ic) # jcn, brown - results are identical with semcor and genesis 
1e+300 
>>> rock.jcn_similarity(change, brown_ic) 
1e-300 
>>> change.jcn_similarity(rock, brown_ic) 
1e-300 
+1

謝謝你的回答,這是否意味着沒有確切的上限爲lch,jcn,res和lesk。在這種情況下,我如何設置一個特定的閾值。 –

+0

嗨!我用jcn和res添加了實驗 - 不確定你對lesk的意思...你的意思是lin嗎? – arturomp

+0

@JanMohd你有沒有發現jcn的上限?因爲我有同樣的問題。 –

相關問題