使用.prop文件編程式地訓練NER模型

我一直使用教程LINK中的示例文件來訓練我的模型。我正在使用相同的prop文件，但是當我不明白如何以編程方式執行時。使用.prop文件編程式地訓練NER模型

props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment, regexner"); 
props.setProperty("ner.model", "resources/NER.prop");

道具文件，如下圖所示：

# location of the training file 
trainFile = nerTEST.tsv 
# location where you would like to save (serialize) your 
# classifier; adding .gz at the end automatically gzips the file, 
# making it smaller, and faster to load 
serializeTo = resources/ner-model.ser.gz 

# structure of your training file; this tells the classifier that 
# the word is in column 0 and the correct answer is in column 1 
map = word=0,answer=1 

# This specifies the order of the CRF: order 1 means that features 
# apply at most to a class pair of previous class and current class 
# or current class and next class. 
maxLeft=1 

# these are the features we'd like to train with 
# some are discussed below, the rest can be 
# understood by looking at NERFeatureFactory 
useClassFeature=true 
useWord=true 
# word character ngrams will be included up to length 6 as prefixes 
# and suffixes only 
useNGrams=true 
noMidNGrams=true 
maxNGramLeng=6 
usePrev=true 
useNext=true 
useDisjunctive=true 
useSequences=true 
usePrevSequences=true 
# the last 4 properties deal with word shape features 
useTypeSeqs=true 
useTypeSeqs2=true 
useTypeySequences=true 
wordShape=chris2useLC

錯誤：

java.io.StreamCorruptedException: invalid stream header: 23206C6F 
.... 
.. 
Caused by: java.io.IOException: Couldn't load classifier from resources/NER.prop

從另一個問題上SO，我理解你直接提供的模型文件。但是，我們如何在屬性文件的幫助下做到這一點？

來源

2016-06-28 Betafish

你應該在命令行中運行以下命令：

java -cp "*" edu.stanford.nlp.ie.crf.CRFClassifier -prop NER.prop

如果你想在Java代碼中運行這個，你可以做這樣的事情：

String[] args = new String[]{"-props", "NER.prop"}; 
CRFClassifier.main(args);

的.prop文件一個指定訓練模型設置的文件。您的代碼嘗試將.prop文件作爲模型本身加載，導致錯誤。

否則要麼會產生在資源最終的模型/ NER-model.ser.gz

來源

2016-06-28 10:21:21 StanfordNLPHelp

這是我從常見問題的理解。但是，我們如何從屬性文件生成該模型文件？ – Betafish

如果按照.prop文件指定的方式運行該命令，則會將該模型保存在resources/ner-model.ser.gz中......您還可以使用Java代碼調用CRFClassifier的main方法（），但不建議 – StanfordNLPHelp

public class TrainModel { 
private void trainCrf(String serializeFile, String prop) { 
    Properties props = StringUtils.propFileToProperties(prop); 
    props.setProperty("serializeTo", serializeFile); 
    SeqClassifierFlags flags = new SeqClassifierFlags(props); 
    CRFClassifier<CoreLabel> crf = new CRFClassifier<>(flags); 
    crf.train(); 
    crf.serializeClassifier(serializeFile); 
} 

public static void main(String[] args) { 

    String serializeFile = "skill/ner-model.ser.gz"; 
    String prop = "ner.props"; 
    TrainModel trainModel = new TrainModel(); 
    trainModel.trainCrf(serializeFile, prop); 
}

}

來源

2016-07-05 06:09:21

謝謝....它的工作。 – ELITE

使用.prop文件編程式地訓練NER模型

回答

相關問題