限制斯坦福大學NER的迭代次數

我正在訓練斯坦福NER CRF模型，在自定義數據集上，但是用於訓練模型的迭代次數現在已經達到了迭代次數 - 即，這個培訓過程現在已經過去了幾個小時。下面是在該終端打印的信息 - 文件被使用在下面給出限制斯坦福大學NER的迭代次數

Iter 335 evals 400 <D> [M 1.000E0] 2.880E3 38054.87s |5.680E1| {6.652E-6} 4.488E-4 - 
Iter 336 evals 401 <D> [M 1.000E0] 2.880E3 38153.66s |1.243E2| {1.456E-5} 4.415E-4 - 
-

性質 - 是有一些方法我可以限制迭代次數說20.

location of the training file 
trainFile = TRAIN5000.tsv 
#location where you would like to save (serialize to) your 
#classifier; adding .gz at the end automatically gzips the file, 
#making it faster and smaller 
serializeTo = ner-model_TRAIN5000.ser.gz 

#structure of your training file; this tells the classifier 
#that the word is in column 0 and the correct answer is in 
#column 1 
map = word=0,answer=1 

#these are the features we'd like to train with 
#some are discussed below, the rest can be 
#understood by looking at NERFeatureFactory 
useClassFeature=true 
useWord=true 
useNGrams=true 
#no ngrams will be included that do not contain either the 
#beginning or end of the word 
noMidNGrams=true 
useDisjunctive=true 
maxNGramLeng=6 
usePrev=true 
useNext=true 
useSequences=true 
usePrevSequences=true 
maxLeft=1 
#the next 4 deal with word shape features 
useTypeSeqs=true 
useTypeSeqs2=true 
useTypeySequences=true 
wordShape=chris2useLC 
saveFeatureIndexToDisk = true 
printFeatures=true 
flag useObservedSequencesOnly=true 
featureDiffThresh=0.05

來源

2017-04-08 akshendra Garg

添加maxIterations=20到屬性文件。

來源

2017-04-09 07:20:53 StanfordNLPHelp

我試過這個，它沒有工作 – arop

我嘗試通過Stanford CoreNLP CRF classifier對0123B所述的帶有IOB標記的標記文本進行生物醫學（BioNER）模型的培訓。

我的語料庫 - 來自下載源文件 - 非常大（約1.5M行; 6個功能：GENE; ...）。作爲培訓似乎無限期地跑，我繪製的值的比例來獲得進展的想法：

Grepping Java源代碼，我發現默認TOL（tolerance;用於確定何時終止訓練課程）值爲1E-6（0.000001），詳見.../CoreNLP/src/edu/stanford/nlp/optimization/QNMinimizer.java。

看看那個情節，我原來的培訓課程永遠不會完成。 [該圖還顯示，設置較大的值，例如， tolerance=0.05，將觸發培訓的提前終止，因爲該值由培訓課程開始附近出現的「噪音」觸發。我在我的.prop文件中用tolerance=0.05條目證實了這一點;然而，0.01TOL值，0.005等是「OK」。]

添加「maxIterations=20」的屬性文件由@StanfordNLPHelp（在此線程別處）中描述似乎被忽略，除非我還添加和改變tolerance=值，在我的bioner.prop屬性文件中;例如

tolerance=0.005 
maxIterations=20 ## optional

在這種情況下，分類器快速訓練模型（bioner.ser.gz）。 [當我加入了maxIterations行我.prop文件，而不添加tolerance線，模型只是不停地跑「永遠」，如前。]

的可包含在.prop文件中的參數列表可以發現這裏：

https://nlp.stanford.edu/nlp/javadoc/javanlp-3.5.0/edu/stanford/nlp/ie/NERFeatureFactory.html

來源

2017-11-06 19:01:01

限制斯坦福大學NER的迭代次數

回答

相關問題