如何在scala中設置邏輯迴歸的數據？

我是新來斯卡拉，我想實現一個迴歸model.So最初我加載如下csv文件：如何在scala中設置邏輯迴歸的數據？

val sqlContext = new org.apache.spark.sql.SQLContext(sc) 
val df = sqlContext.read.format("com.databricks.spark.csv") 
    .option("header", "true") 
    .option("inferSchema", "true") 
    .load("D:/sample.txt")

的文件是如下：

P,P,A,A,A,P,NB 
N,N,A,A,A,N,NB 
A,A,A,A,A,A,NB 
P,P,P,P,P,P,NB 
N,N,P,P,P,N,NB 
A,A,P,P,P,A,NB 
P,P,A,P,P,P,NB 
P,P,P,A,A,P,NB 
P,P,A,P,A,P,NB 
P,P,A,A,P,P,NB 
P,P,P,P,A,P,NB 
P,P,P,A,P,P,NB 
N,N,A,P,P,N,NB 
N,N,P,A,A,N,NB 
N,N,A,P,A,N,NB 
N,N,A,P,A,N,NB 
N,N,A,A,P,N,NB 
N,N,P,P,A,N,NB 
N,N,P,A,P,N,NB 
A,A,A,P,P,A,NB 
A,A,P,A,A,A,NB 
A,A,A,P,A,A,NB 
A,A,A,A,P,A,NB 
A,A,P,P,A,A,NB 
A,A,P,A,P,A,NB 
P,N,A,A,A,P,NB 
N,P,A,A,A,N,NB 
P,N,A,A,A,N,NB 
P,N,P,P,P,P,NB 
N,P,P,P,P,N,NB

然後我想

val lr = new LogisticRegression() 
     .setMaxIter(10) 
     .setRegParam(0.3) 
     .setElasticNetParam(0.8) 
     .setFeaturesCol("Feature") 
     .setLabelCol("Label")

然後，我通過以下擬合模型：

通過下面的代碼訓練模型3210

val lrModel = lr.fit(df) 

println(lrModel.coefficients +"are the coefficients") 
println(lrModel.interceptVector+"are the intercerpt vactor") 
println(lrModel.summary +"is summary")

但它沒有打印結果。

任何幫助表示讚賞。

來源

2017-07-07 Ricky

從代碼：

val lr = new LogisticRegression() 
     .setMaxIter(10) 
     .setRegParam(0.3) 
     .setElasticNetParam(0.8) 
     .setFeaturesCol("Feature") <- here 
     .setLabelCol("Label") <- here

要設置features柱和label列。由於您沒有提及列名，因此我假設包含NB值的列是您的標籤，並且您希望包含所有其他列是預測列。

您希望包含在模型中的所有預測變量都需要採用單向量列的形式，通常稱爲features列。你需要它使用VectorAssembler如下創建：

import org.apache.spark.ml.feature.VectorAssembler 
import org.apache.spark.ml.linalg.Vectors 

//creating features column 
val assembler = new VectorAssembler() 
    .setInputCols(Array(" insert your column names here ")) 
    .setOutputCol("Feature")

參見：https://spark.apache.org/docs/latest/ml-features.html#vectorassembler。

現在您可以開始擬合邏輯迴歸模型。用於在fitting之前組合數據的多個轉換pipeline。

val pipeline = new Pipeline().setStages(Array(assembler,lr)) 

//fitting the model 
val lrModel = pipeline.fit(df)

來源

2017-07-07 06:28:04 vdep

如果我用這個lrModel不能產生任何係數，即決處決或任意其它things.Could請您解釋一下爲什麼會這樣 – Ricky

星火ML只能取數值作爲輸入。由於您的預測變量列包含分類值（P，N，A，...），因此您需要先將它們轉換爲數值。使用'StringIndexer'或'OneHotEncoder'來完成它並將結果列名傳遞給'VectorAssembler'輸入。請參閱：https://spark.apache.org/docs/latest/ml-features.html#stringindexer和https://spark.apache.org/docs/latest/ml-features.html#onehotencoder。我希望你很清楚。 – vdep

如何在scala中設置邏輯迴歸的數據？

回答

相關問題