2014-04-01 75 views
1

這是訓練數據集的標量代碼。問題是什麼?當我使用Standford TMT運行LDA時,我總是得到這個錯誤,「java.lang.UnsupportedOperationException:empty.max」

val tokenizer = { 
    SimpleEnglishTokenizer() ~>   // tokenize on space and punctuation 
    CaseFolder() ~>      // lowercase everything 
    WordsAndNumbersOnlyFilter() ~>   // ignore non-words and non-numbers 
    //MinimumLengthFilter(1) ~>    // take terms with >=3 characters 
    PorterStemmer() //~> 
    //StopWordFilter("en") 
} 

val text = { 
    source ~>        // read from the source file 
    Columns(4,6) ~> 
    Join(" ") ~>       // select column containing text 
    TokenizeWith(tokenizer) ~>    // tokenize with tokenizer above 
    TermCounter() //~>      // collect counts (needed below) 
    TermMinimumDocumentCountFilter(0) ~> // filter terms in <4 docs 
    TermDynamicStopListFilter(0) ~> // filter out 30 most common terms 
    TermMinimumDocumentCountFilter(0) // take only docs with >=5 terms 
} 

// define fields from the dataset we are going to slice against 
val labels = { 
    source ~>        // read from the source file 
    Column(5) ~>       // take column two, the year 
    TokenizeWith(WhitespaceTokenizer()) ~> // turns label field into an array 
    TermCounter() //~>      // collect label counts 
    TermMinimumDocumentCountFilter(0)  // filter labels in < 10 docs 
} 

val dataset = LabeledLDADataset(text, labels); 

// define the model parameters 
val modelParams = LabeledLDAModelParams(dataset); 

// Name of the output model folder to generate 
val modelPath = file("llda-cvb0-"+dataset.signature+"-"+modelParams.signature); 

// Trains the model, writing to the given output path 
TrainCVB0LabeledLDA(modelParams, dataset, output = modelPath, maxIterations = 1000); 

回答

0

行是錯誤的TermDynamicStopListFilter(0) ~> // filter out 30 most common terms

應該 TermDynamicStopListFilter(30) 過濾掉詞出現30餘次,爲註釋。

相關問題