樸素貝葉斯與Apache星火MLlib

我用樸素貝葉斯與Apache星火MLlib文本分類如下教程：http://avulanov.blogspot.com/2014/08/text-classification-with-apache-spark.html 樸素貝葉斯與Apache星火MLlib

/* instantiate Spark context (not needed for running inside Spark shell */ 
val sc = new SparkContext("local", "test") 
/* word to vector space converter, limit to 10000 words */ 
val htf = new HashingTF(10000) 
/* load positive and negative sentences from the dataset */ 
/* let 1 - positive class, 0 - negative class */ 
/* tokenize sentences and transform them into vector space model */ 
val positiveData = sc.textFile("/data/rt-polaritydata/rt-polarity.pos") 
    .map { text => new LabeledPoint(1, htf.transform(text.split(" ")))} 
val negativeData = sc.textFile("/data/rt-polaritydata/rt-polarity.neg") 
    .map { text => new LabeledPoint(0, htf.transform(text.split(" ")))} 
/* split the data 60% for training, 40% for testing */ 
val posSplits = positiveData.randomSplit(Array(0.6, 0.4), seed = 11L) 
val negSplits = negativeData.randomSplit(Array(0.6, 0.4), seed = 11L) 
/* union train data with positive and negative sentences */ 
val training = posSplits(0).union(negSplits(0)) 
/* union test data with positive and negative sentences */ 
val test = posSplits(1).union(negSplits(1)) 
/* Multinomial Naive Bayesian classifier */ 
val model = NaiveBayes.train(training) 
/* predict */ 
val predictionAndLabels = test.map { point => 
    val score = model.predict(point.features) 
    (score, point.label) 
} 
/* metrics */ 
val metrics = new MulticlassMetrics(predictionAndLabels) 
/* output F1-measure for all labels (0 and 1, negative and positive) */ 
metrics.labels.foreach(l => println(metrics.fMeasure(l)))

但是，之後的訓練數據。如果我想知道句子「祝你有美好的一天」是積極的還是消極的，我該怎麼辦？謝謝。

來源

2015-10-13 Thanh Thai Nguyen

一般來說，你需要兩件事情做出一個原始數據預測：

應用你用於訓練數據相同的轉換。如果某些變壓器需要擬合（如IDF，標準化，編碼），則必須使用適合於訓練數據的變壓器。因爲你的方法是非常簡單的，所有你需要在這裏是這樣的：
```
val testData = htf.transform("Have a nice day".split(" ")) 
```
使用predict的訓練模型的方法：
```
model.predict(testData) 
```

來源

2015-10-13 11:29:34 zero323

樸素貝葉斯與Apache星火MLlib

回答

相關問題