2012-10-12 127 views
2

比較兩個同義詞我有同義詞詞典:的Python:通過詞典

synonym = {"this": ["this", "same"], 
      "all": ["all", "any", "*"], 
      "alluptolastyear": ["alluptolastyear", "uptolastyear"], 
      "dekadbefore": ["dekadbefore", "lastdekad", "formerdekad", "precedingdekad"], 
      "dekadafter": ["dekadafter", "nextdekad", "followingdekad"], 
      "yearbefore": ["yearbefore", "lastyear", "formeryear"], 
      "monthbefore": ["monthbefore", "lastmonth", "precedingmonth"]} 

每個陣列存儲同義詞,通過鍵引用。 我從XML文件中讀取兩個字符串,並嘗試比較它們。

例如:

  • "this""same"相等(同義詞)
  • ' 「lastyear」' 和 ' 「formeryear」' 是相等的(同義詞)
  • "all""nextdekad"不同
  • 當然,每個鍵值都在其對應的數組中找到,所以每個鍵都是其數組字符串的同義詞。

有人可以幫我寫這些字符串的pythonic比較使用同義詞字典嗎?

+0

在你的例子中,'lastyear'是'formeryear'的同義詞嗎? – sloth

+0

是的。數組內的所有值都是同義詞,它們通過字典鍵引用。 –

+0

你能否提供一個代碼,用哪一個可以生成你在這裏播種的同義詞字典?如果==「lastmonth」和b ==「previousmonth」 – user1993

回答

6

試試這個:

def are_sinonims(a, b): 
    return a in synonym.get(b,[]) or b in synonym.get(a,[]) or any(a in synonym[k] and b in synonym[k] for k in synonym) 

而且,我們可以重寫部分a in synonym[k] and b in synonym[k] for k in synonyma in words and b in words for words in synonym.values()這樣:

def are_sinonims(a, b): 
    return a in synonym.get(b,[]) \ 
      or b in synonym.get(a,[]) \ 
      or any(a in words and b in words for words in synonym.values()) 
+0

@defuz,感謝您的糾正。 – Marcus

+0

如何在單詞和單詞中使用單詞in synonym.values()'? – defuz

+0

@defuz,看起來不錯,我會更新我的答案。 – Marcus

3

爲什麼不乾脆:

def are_sinonims(a, b): 
    return b in synonym.get(a, []) or a in synonym.get(b, []) 

與故障後評論編輯。

+2

將無法​​正常工作。 – Marcus

+1

這個解決方案不適用於''formeryear「'和'」去年「' – defuz

+0

,但是我們確定這就是我們想要的, – Bogdan

1

首先,制定新的字典,每同義詞作爲重點:

word_to_word = {}  
for syns in synonym.values(): 
    for word in syns: 
     word_to_word[word] = syns 

功能比較字符串:

def are_sinomic(a, b):  
    words_a, words_b = a.split(), b.split() 
    if len(words_a) != len(words_b): 
     return False 
    for word_a, word_b in zip(words_a, words_b): 
     if word_a != word_b and word_b not in word_to_word.get(word_a, []): 
      return False 
    return True 
0

如果你只關注某事的代名詞,那麼你可以建立一套從字典的值的排列2元組...:

synonym = {"this": ["this", "same"], 
      "all": ["all", "any", "*"], 
      "alluptolastyear": ["alluptolastyear", "uptolastyear"], 
      "dekadbefore": ["dekadbefore", "lastdekad", "formerdekad", "precedingdekad"], 
      "dekadafter": ["dekadafter", "nextdekad", "followingdekad"], 
      "yearbefore": ["yearbefore", "lastyear", "formeryear"], 
      "monthbefore": ["monthbefore", "lastmonth", "precedingmonth"]} 

from itertools import chain, permutations 
synonym_set = set(chain.from_iterable(permutations(val, 2) for val in synonym.values())) 

def are_synonyms(a, b): 
    return (a, b) in synonym_set 
4

你可每個字轉換成「同義詞哈希」(的東西,如果兩個詞是同義詞和不同否則等於):

def sym_hash(word): 
    for w, s in synonym.items(): 
     if word == w or word in s: 
      return w 
    return word 

,然後用自己的「散列」比較的話:

def phrases_equal(p1, p2): 
    return all(sym_hash(a) == sym_hash(b) for a, b in zip(p1, p2)) 

p1 = "all your base this dekadbefore are formeryear".split() 
p2 = "any your base same lastdekad are yearbefore".split() 

print phrases_equal(p1, p2) # True 

實際上,對於同義詞數據庫中的合適的數據結構似乎是集列表,而不是一個字典:

synonym = [ 
    {"this", "same"}, 
    {"all", "any", "*"}, 
    {"alluptolastyear", "uptolastyear"}, 
    {"dekadbefore", "lastdekad", "formerdekad", "precedingdekad"}, 
    {"dekadafter", "nextdekad", "followingdekad"}, 
    {"yearbefore", "lastyear", "formeryear"}, 
    {"monthbefore", "lastmonth", "precedingmonth"} 
] 

在這種情況下,我們可以作爲

def sym_hash(word): 
    return next((s for s in synonym if word in s), word) 
更有效地編碼 sym_hash