2016-09-27 43 views
-1
data = sqlContext.sql("select a.churn,b.pay_amount,c.all_balance from db_bi.t_cust_churn a left join db_bi.t_cust_pay b on a.cust_id=b.cust_id left join db_bi.t_cust_balance c on a.cust_id=c.cust_id limit 5000").cache() 

def labelData(df): 
    return df.map(lambda row: LabeledPoint(row[0], row[1:])) 
traindata = labelData(data) --this step works well. 
from pyspark.ml.classification import LogisticRegression 
lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8) 
lrModel = lr.fit(lrdata) 
lrModel = lr.fit(lrdata) 

AttributeError       Traceback (most recent call last) 
<ipython-input-40-b84a106121e6> in <module>() 
----> 1 lrModel = lr.fit(lrdata) 

/home/hadoop/spark/python/pyspark/ml/pipeline.pyc in fit(self, dataset, params) 
    67     return self.copy(params)._fit(dataset) 
    68    else: 
---> 69     return self._fit(dataset) 
    70   else: 
    71    raise ValueError("Params must be either a param map or a list/tuple of param maps, " 

/home/hadoop/spark/python/pyspark/ml/wrapper.pyc in _fit(self, dataset) 
    131 
    132  def _fit(self, dataset): 
--> 133   java_model = self._fit_java(dataset) 
    134   return self._create_model(java_model) 
    135 

/home/hadoop/spark/python/pyspark/ml/wrapper.pyc in _fit_java(self, dataset) 
    128   """ 
    129   self._transfer_params_to_java() 
--> 130   return self._java_obj.fit(dataset._jdf) 
    131 
    132  def _fit(self, dataset): 

AttributeError: 'PipelinedRDD' object has no attribute '_jdf' 
+0

lrModel = lr.fit(lrdata) - ?這是一個錯字,應該是traindata要適應吧 –

+0

Sorry.My我的意思是lrModel = lr.fit(traindata) –

回答

0

我猜你是使用教程最新版本的火花與(2.0.1) pyspark.ml.classification import LogisticRegression而你需要一些其他的版本,比如1.6.2pyspark.mllib.classification import LogisticRegressionWithLBFGS, LogisticRegressionModel注意不同庫

相關問題