2016-11-05 45 views
0

考慮以下基礎:計算PMI值使用給定上下文窗口

basis = "Each word of the text is converted as follows: move any consonant (or consonant cluster) that appears at the start of the word to the end, then append ay." 

和下面的話:

words = "word, text, bank, tree" 

我如何計算「改爲」每一個字的PMI值與「基礎」中的每個單詞相比,我可以使用上下文窗口大小5(即前兩個位置和目標單詞後兩個位置)?

我知道如何計算PMI,但我不知道如何處理上下文窗口的事實。

我計算「正常」 PMI值如下:

def PMI(ContingencyTable): 
    (a,b,c,d,N) = ContingencyTable 
    # avoid log(0) 
    a += 1 
    b += 1 
    c += 1 
    d += 1 
    N += 4 

    R_1 = a + b 
    C_1 = a + c 

    return log(float(a)/(float(R_1)*float(C_1))*float(N),2) 

回答

0

我做了對PMI有點搜索,看起來像重型包裝都在那裏,「窗」包括

PMI 「相互」似乎指的是兩個不同的單詞的聯合概率,所以你需要確定關於問題陳述的想法

我接受了在問題狀態中生成短窗口列表的小問題主要是爲了我自己的鍛鍊

def wndw(wrd_l, m_l, pre, post): 
    """ 
    returns a list of all lists of sequential words in input wrd_l 
    that are within range -pre and +post of any word in wrd_l that matches 
    a word in m_l 

    wrd_l  = list of words 
    m_l  = list of words to match on 
    pre, post = ints giving range of indices to include in window size  
    """ 
    wndw_l = list() 
    for i, w in enumerate(wrd_l): 
     if w in m_l: 
      wndw_l.append([wrd_l[i + k] for k in range(-pre, post + 1) 
              if 0 <= (i + k) < len(wrd_l)]) 
    return wndw_l 

basis = """Each word of the text is converted as follows: move any 
      consonant (or consonant cluster) that appears at the start 
      of the word to the end, then append ay.""" 

words = "word, text, bank, tree" 

print(*wndw(basis.split(), [x.strip() for x in words.split(',')], 2, 2), 
     sep="\n") 
['Each', 'word', 'of', 'the'] 
['of', 'the', 'text', 'is', 'converted'] 
['of', 'the', 'word', 'to', 'the']