我一直使用教程LINK中的示例文件來訓練我的模型。我正在使用相同的prop文件,但是當我不明白如何以編程方式執行時。使用.prop文件編程式地訓練NER模型
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment, regexner");
props.setProperty("ner.model", "resources/NER.prop");
道具文件,如下圖所示:
# location of the training file
trainFile = nerTEST.tsv
# location where you would like to save (serialize) your
# classifier; adding .gz at the end automatically gzips the file,
# making it smaller, and faster to load
serializeTo = resources/ner-model.ser.gz
# structure of your training file; this tells the classifier that
# the word is in column 0 and the correct answer is in column 1
map = word=0,answer=1
# This specifies the order of the CRF: order 1 means that features
# apply at most to a class pair of previous class and current class
# or current class and next class.
maxLeft=1
# these are the features we'd like to train with
# some are discussed below, the rest can be
# understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
# word character ngrams will be included up to length 6 as prefixes
# and suffixes only
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useDisjunctive=true
useSequences=true
usePrevSequences=true
# the last 4 properties deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
錯誤:
java.io.StreamCorruptedException: invalid stream header: 23206C6F
....
..
Caused by: java.io.IOException: Couldn't load classifier from resources/NER.prop
從另一個問題上SO,我理解你直接提供的模型文件。但是,我們如何在屬性文件的幫助下做到這一點?
這是我從常見問題的理解。但是,我們如何從屬性文件生成該模型文件? – Betafish
如果按照.prop文件指定的方式運行該命令,則會將該模型保存在resources/ner-model.ser.gz中......您還可以使用Java代碼調用CRFClassifier的main方法(),但不建議 – StanfordNLPHelp