我想建立使用dataframes pyspark.ml庫(不mllib爲RDD)隨機森林分類。 我是否必須使用文檔中給出的管道? 我只是想建立一個簡單的模型,隨機森林使用pyspark.ml爲Dataframes
rf = RandomForestClassifier(labelCol = labs, featuresCol = rawdata)
我碰到下面的錯誤
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/__init__.py", line 104, in wrapper
return func(self, **kwargs)
File "/usr/lib/spark/python/pyspark/ml/classification.py", line 910, in __init
__
self.setParams(**kwargs)
File "/usr/lib/spark/python/pyspark/__init__.py", line 104, in wrapper
return func(self, **kwargs)
File "/usr/lib/spark/python/pyspark/ml/classification.py", line 928, in setPar
ams
return self._set(**kwargs)
File "/usr/lib/spark/python/pyspark/ml/param/__init__.py", line 421, in _set
raise TypeError('Invalid param value given for param "%s". %s' % (p.name, e)
)
TypeError: Invalid param value given for param "labelCol". Could not convert <cl
ass 'pyspark.sql.dataframe.DataFrame'> to string type
我的標籤樣品
+---+
| _2|
+---+
|0.0|
|1.0|
|0.0|
|0.0|
|0.0|
|0.0|
|1.0|
|1.0|
|1.0|
|0.0|
|0.0|
|0.0|
|0.0|
|0.0|
|0.0|
|0.0|
|0.0|
|0.0|
|1.0|
|1.0|
+---+
我的數據是180列類似。
你不需要*使用管道。如需更多幫助,請提供您的數據樣本 – desertnaut
我已編輯過這篇文章。謝謝。 – Nivi