愚蠢退避中的折扣值

我遵循NLP教程here（6'58''） - 關於愚蠢退避平滑算法的部分。在視頻教程和implementation of bi-gram level stupid-backoff，他們使用的折扣值= 0.4愚蠢退避中的折扣值

實現兩字級的退避：

def score(self, sentence): 
    score = 0.0 
    previous = sentence[0] 
    for token in sentence[1:]: 
     bicount = self.bigramCounts[(previous, token)] 
     bi_unicount = self.unigramCounts[previous] 
     unicount = self.unigramCounts[token] 
     if bicount > 0: 
      score += math.log(bicount) 
      score -= math.log(bi_unicount) 
     else: 
      score += math.log(0.4)  // discount here 
      score += math.log(unicount + 1) 
      score -= math.log(self.total + self.vocab_size) 
     previous = token 
    return score

但隨後trigram-level implementation，貼現值是1

def score(self, sentence): 
    score = 0.0 
    fst = sentence[0] 
    snd = sentence[1] 
    for token in sentence[2:]: 
     tricount = self.trigramCounts[(fst, snd, token)] 
     tri_bicount = self.bigramCounts[(fst, snd)] 
     bicount = self.bigramCounts[(snd, token)] 
     bi_unicount = self.unigramCounts[snd] 
     unicount = self.unigramCounts[token] 
     if tricount > 0: 
      score += math.log(tricount) 
      score -= math.log(tri_bicount) 
     elif bicount > 0: 
      score += math.log(bicount)    // no discount here 
      score -= math.log(bi_unicount) 
     else: 
      score += math.log((unicount + 1))  // no discount here 
      score -= math.log(self.total + self.vocab_size) 
     fst, snd = snd, token 
    return score

當我跑project - 與折扣設置0.4和1的三克的水平，我得到的分數：

tri-gram with discount = 0.4 < bi-gram with discount = 0.4 < tri-gram with discount =1

這很容易知道爲什麼 - 有折扣= 0.4，成爲三克的最終else：

else: 
    score += math.log(0.4)  // -> -0.3979 
    score += math.log(0.4)  // -> -0.3979 
    score += math.log((unicount + 1))  // no discount here 
    score -= math.log(self.total + self.vocab_size)

所以我真的很困惑 - 0.4值是從哪裏來的？

來源

2016-03-05 user3448806

0.4在愚蠢的退避？ – user3639557

@ user3639557是的，但我不知道爲什麼它是0.4，爲什麼在trigram例子中，他們不使用這個折扣。 – user3448806

這是非常隨意的，這就是爲什麼他們把它稱爲愚蠢回退。閱讀以下答案中引用的論文。 – user3639557

看看谷歌paper提出愚蠢的退避。

來源

2016-03-05 09:17:00 user3639557

愚蠢退避中的折扣值

回答

相關問題