1
分類器經常因OutOfMemoryError失敗。請建議。使用Mallet CRF分類器的OutOfMemoryError
我們有UIMA管道,每個管道調用5個大約30MB的模型罐(基於槌CRF)。 -Xms設置爲2G,-Xmx設置爲4G。
設置堆空間時是否有任何指導/基準標記? 請指出是否有關於多線程環境的任何指導。
我確實嘗試了應用補丁https://code.google.com/p/cleartk/issues/detail?id=408,但這並未解決問題。
堆轉儲顯示42%的堆大小是char [],15%是String。
java.lang.OutOfMemoryError: Java heap space
at cc.mallet.types.IndexedSparseVector.setIndex2Location(IndexedSparseVector.java:109)
at cc.mallet.types.IndexedSparseVector.dotProduct(IndexedSparseVector.java:157)
at cc.mallet.fst.CRF$TransitionIterator.<init>(CRF.java:1856)
at cc.mallet.fst.CRF$TransitionIterator.<init>(CRF.java:1835)
at cc.mallet.fst.CRF$State.transitionIterator(CRF.java:1776)
at cc.mallet.fst.MaxLatticeDefault.<init>(MaxLatticeDefault.java:252)
at cc.mallet.fst.MaxLatticeDefault.<init>(MaxLatticeDefault.java:197)
at cc.mallet.fst.MaxLatticeDefault$Factory.newMaxLattice(MaxLatticeDefault.java:494)
at cc.mallet.fst.MaxLatticeFactory.newMaxLattice(MaxLatticeFactory.java:11)
at cc.mallet.fst.Transducer.transduce(Transducer.java:124)
at org.cleartk.ml.mallet.MalletCrfStringOutcomeClassifier.classify(MalletCrfStringOutcomeClassifier.java:90)
模型基於MalletCrfStringOutcomeDataWriter創建。
AnalysisEngineFactory.createEngineDescription(DataChunkAnnotator.class,
CleartkSequenceAnnotator.PARAM_IS_TRAINING, true, DirectoryDataWriterFactory.PARAM_OUTPUT_DIRECTORY,
options.getModelsDirectory(), DefaultSequenceDataWriterFactory.PARAM_DATA_WRITER_CLASS_NAME, MalletCrfStringOutcomeDataWriter.class)
註釋器代碼如下所示。
if (this.isTraining()) {
List<DataAnnotation> namedEntityMentions = JCasUtil.selectCovered(jCas, DataAannotation.class, sentence);
List<String> outcomes = this.chunking.createOutcomes(jCas, tokens, namedEntityMentions);
this.dataWriter.write(Instances.toInstances(outcomes, featureLists));
} else {
List<String> outcomes = this.classifier.classify(featureLists);
this.chunking.createChunks(jCas, tokens, outcomes);
}
感謝