0
我正在編寫一個pyspark應用程序,並希望在MLlib Linear Regression中使用算法。但我不知道如何保存/加載輸出。 我的代碼:如何將pyspark ml模型保存/加載到HBase
import os
import sys
os.environ['SPARK_HOME']="C:\spark-2.2.0-bin-hadoop2.7"
try:
from pyspark.sql import SparkSession
from pyspark.ml.regression import LinearRegression
from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
except ImportError as e:
print ("Error importing Spark Modules", e)
sys.exit(1)
spark=SparkSession.builder.appName("lrexample").getOrCreate()
data=spark.read.csv("E:/Customers.csv", inferSchema=True, header=True)
assembler=VectorAssembler(inputCols=['Avg Session Length','Time on App','Time on Website','Length of Membership'],outputCol='features')
output=assembler.transform(data)
final_data=output.select('features','Yearly Amount Spent')
train_data,test_data=final_data.randomSplit([0.7,0.3])
lr=LinearRegression(labelCol='Yearly Amount Spent')
lr_model=lr.fit(train_data)
我的問題是我怎麼加載/保存lr_model。我將使用HBase