我從Scala開始使用Spark MLib庫。根據我迄今爲止的測試,我無法獲得遠程正確的結果。正在嘗試幾種方法來完成它,但沒有成功。就目前而言,即使使用相對簡單的數據:火花的線性迴歸權重和預測
1,1
2,2
3,3
4,4
5,5
6,6
7,7
8,8
9,9
10,10
我無法得到任何體面的結果。這是我到目前爲止的代碼:[相當標準我猜]
import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors
val data = sc.textFile("/Users/jacek/oo.csv")
val parsedData = data.map { line =>
val parts = line.split(',')
LabeledPoint(parts(0).toDouble, Vectors.dense(Array(1.0, parts(1).toDouble)))
}
val numIterations = 20
val model = LinearRegressionWithSGD.train(parsedData, numIterations)
val valuesAndPreds = parsedData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
這些是我得到的結果:
model: org.apache.spark.mllib.regression.LinearRegressionModel = (weights=[-1.3423470408513295E21,-9.345181656001024E21], intercept=0.0)
scala> parsedData.take(10)
res48: Array[org.apache.spark.mllib.regression.LabeledPoint] = Array((1.0,[1.0,1.0]), (2.0,[1.0,2.0]), (3.0,[1.0,3.0]), (4.0,[1.0,4.0]), (5.0,[1.0,5.0]), (6.0,[1.0,6.0]), (7.0,[1.0,7.0]), (8.0,[1.0,8.0]), (9.0,[1.0,9.0]), (10.0,[1.0,10.0]))
scala> valuesAndPreds.take(10)
res49: Array[(Double, Double)] = Array((1.0,-6.764535208E21), (2.0,-1.2266421529070415E22), (3.0,-1.8399632293605623E22), (4.0,-2.453284305814083E22), (5.0,-3.0666053822676038E22), (6.0,-3.6799264587211245E22), (7.0,-4.293247535174645E22), (8.0,-4.906568611628166E22), (9.0,-5.519889688081687E22), (10.0,-6.7645352076E22))
scala>
我試着套不同的線性迴歸算法設置沒有多少運氣。 任何幫助表示讚賞。