在您使用scikit的LatentDirichletAllocation類訓練時評估模型

我正在scikit-learn中試驗LatentDirichletAllocation() class，並且evaluate_every參數具有以下說明。在您使用scikit的LatentDirichletAllocation類訓練時評估模型

評估困惑的頻率。僅適用於合身方式。將其設置爲0 或負數，以免在訓練中完全評估困惑。評估困惑可以幫助您在訓練過程中檢查收斂，但它也會增加總訓練時間。在每次迭代中評估困惑可能會將培訓時間增加兩倍，最多可達。

我將此參數設置爲2（默認爲0）並且看到增加的訓練時間，但我似乎無法在任何地方找到困惑值。這些結果是保存的，還是僅供模型使用，以確定何時停止？我希望用困惑的價值來衡量我的模型的進步和學習曲線。

來源

2017-01-07 neelshiv

它在配合使用perp_tol參數來評估收斂，並沒有保存迭代之間，每source：

for i in xrange(max_iter): 

    # ... 

    # check perplexity 
    if evaluate_every > 0 and (i + 1) % evaluate_every == 0: 
     doc_topics_distr, _ = self._e_step(X, cal_sstats=False, 
              random_init=False, 
              parallel=parallel) 
     bound = self.perplexity(X, doc_topics_distr, 
           sub_sampling=False) 
     if self.verbose: 
      print('iteration: %d, perplexity: %.4f' 
        % (i + 1, bound)) 

     if last_bound and abs(last_bound - bound) < self.perp_tol: 
      break 
     last_bound = bound 
    self.n_iter_ += 1

不過請注意，你可以輕鬆地適應現有的源通過做到這一點（1 ）將線self.saved_bounds = []到__init__方法（2）加入self.bounds.append(bound)上面，像這樣：

if last_bound and abs(last_bound - bound) < self.perp_tol: 
    break 
last_bound = bound 
self.bounds.append(bound)

根據你保存更新類，Y您還必須修改文件頂部的導入以引用scikit-learn中的完整模塊路徑。

來源

2017-01-12 03:00:15

在您使用scikit的LatentDirichletAllocation類訓練時評估模型

回答

相關問題