使用Mallet CRF分類器的OutOfMemoryError

分類器經常因OutOfMemoryError失敗。請建議。使用Mallet CRF分類器的OutOfMemoryError

我們有UIMA管道，每個管道調用5個大約30MB的模型罐（基於槌CRF）。 -Xms設置爲2G，-Xmx設置爲4G。

設置堆空間時是否有任何指導/基準標記？請指出是否有關於多線程環境的任何指導。

我確實嘗試了應用補丁https://code.google.com/p/cleartk/issues/detail?id=408，但這並未解決問題。

堆轉儲顯示42％的堆大小是char []，15％是String。

java.lang.OutOfMemoryError: Java heap space 
    at cc.mallet.types.IndexedSparseVector.setIndex2Location(IndexedSparseVector.java:109) 
    at cc.mallet.types.IndexedSparseVector.dotProduct(IndexedSparseVector.java:157) 
    at cc.mallet.fst.CRF$TransitionIterator.<init>(CRF.java:1856) 
    at cc.mallet.fst.CRF$TransitionIterator.<init>(CRF.java:1835) 
    at cc.mallet.fst.CRF$State.transitionIterator(CRF.java:1776) 
    at cc.mallet.fst.MaxLatticeDefault.<init>(MaxLatticeDefault.java:252) 
    at cc.mallet.fst.MaxLatticeDefault.<init>(MaxLatticeDefault.java:197) 
    at cc.mallet.fst.MaxLatticeDefault$Factory.newMaxLattice(MaxLatticeDefault.java:494) 
    at cc.mallet.fst.MaxLatticeFactory.newMaxLattice(MaxLatticeFactory.java:11) 
    at cc.mallet.fst.Transducer.transduce(Transducer.java:124) 
    at org.cleartk.ml.mallet.MalletCrfStringOutcomeClassifier.classify(MalletCrfStringOutcomeClassifier.java:90)

模型基於MalletCrfStringOutcomeDataWriter創建。

AnalysisEngineFactory.createEngineDescription(DataChunkAnnotator.class, 
     CleartkSequenceAnnotator.PARAM_IS_TRAINING, true, DirectoryDataWriterFactory.PARAM_OUTPUT_DIRECTORY, 
     options.getModelsDirectory(), DefaultSequenceDataWriterFactory.PARAM_DATA_WRITER_CLASS_NAME, MalletCrfStringOutcomeDataWriter.class)

註釋器代碼如下所示。

if (this.isTraining()) { 
     List<DataAnnotation> namedEntityMentions = JCasUtil.selectCovered(jCas, DataAannotation.class, sentence); 
     List<String> outcomes = this.chunking.createOutcomes(jCas, tokens, namedEntityMentions); 
     this.dataWriter.write(Instances.toInstances(outcomes, featureLists)); 
     } else { 
     List<String> outcomes = this.classifier.classify(featureLists); 
     this.chunking.createChunks(jCas, tokens, outcomes); 
     }

感謝

來源

2015-11-19 Tilak

您可以嘗試：

增加XMX
更深入地分析堆：所有字符串被char[]備份 - 所以知道號碼，如42％和15％是沒有用的 - 你應該調查你的程序的哪一部分分配這些字符串。
，因爲它看起來像行觸發錯誤：
List<String> outcomes = this.classifier.classify(featureLists);
你可以從那裏開始：揣摩什麼在featureLists，什麼是它的大小等等，看看什麼是方法classify做的，如果你能「幫助「它在記憶方面變得更有效率。例如，減少String的使用並將其替換爲StringBuilder和append（僅作爲示例）。

來源

2015-11-19 18:06:37 alfasin

使用Mallet CRF分類器的OutOfMemoryError

回答

相關問題