從醃菜中加載的HMM看起來沒有經過培訓

我試圖將nltk.tag.hmm.HiddenMarkovModelTagger連載到一個泡菜中，以便在需要時使用它而無需重新培訓。但是，從.pkl加載後，我的HMM看起來沒有經過培訓。我的兩個問題是：從醃菜中加載的HMM看起來沒有經過培訓

我在做什麼錯？
當一個人有一個大數據集連續HMM 是一個好主意嗎？

下面的代碼：

In [1]: import nltk 

In [2]: from nltk.probability import * 

In [3]: from nltk.util import unique_list 

In [4]: import json 

In [5]: with open('data.json') as data_file: 
    ...:   corpus = json.load(data_file) 
    ...:  

In [6]: corpus = [[tuple(l) for l in sentence] for sentence in corpus] 

In [7]: tag_set = unique_list(tag for sent in corpus for (word,tag) in sent) 

In [8]: symbols = unique_list(word for sent in corpus for (word,tag) in sent) 

In [9]: trainer = nltk.tag.HiddenMarkovModelTrainer(tag_set, symbols) 

In [10]: train_corpus = corpus[:4] 

In [11]: test_corpus = [corpus[4]] 

In [12]: hmm = trainer.train_supervised(train_corpus, estimator=LaplaceProbDist) 

In [13]: print('%.2f%%' % (100 * hmm.evaluate(test_corpus))) 
100.00%

正如你所看到的HMM訓練。現在我醃它：

In [14]: import pickle 

In [16]: output = open('hmm.pkl', 'wb') 

In [17]: pickle.dump(hmm, output) 

In [18]: output.close()

復位後和加載模型看起來比岩石一盒笨：）

In [19]: %reset 
Once deleted, variables cannot be recovered. Proceed (y/[n])? y 

In [20]: import pickle 

In [21]: import json 

In [22]: with open('data.json') as data_file: 
    ....:  corpus = json.load(data_file) 
    ....:  

In [23]: test_corpus = [corpus[4]] 

In [24]: pkl_file = open('hmm.pkl', 'rb') 

In [25]: hmm = pickle.load(pkl_file) 

In [26]: pkl_file.close() 

In [27]: type(hmm) 
Out[27]: nltk.tag.hmm.HiddenMarkovModelTagger 

In [28]: print('%.2f%%' % (100 * hmm.evaluate(test_corpus))) 
0.00%

來源

2016-07-31 Nik P

After In [22] - corpus = [[句子中的l的元組（l））] – RAVI

謝謝@RAVI） –

1在後[22]，你需要添加 -

corpus = [[tuple(l) for l in sentence] for sentence in corpus]

2）每次重新訓練模型用於測試目的將是耗時的。所以，pickle.dump模型並加載它是件好事。

來源

2016-07-31 14:57:20 RAVI

從醃菜中加載的HMM看起來沒有經過培訓

回答

相關問題