Apache pyspark ML示例不能正常工作

我正在嘗試使用jupyter筆記本（運行Pyspark）創建一個簡單的df（數據框），並且我不斷收到一條長長的錯誤消息，該網頁上的ML示例（IndexToString）：http://spark.apache.org/docs/latest/ml-features.html#onehotencoder。其中一行說：Apache pyspark ML示例不能正常工作

Py4JJavaError: An error occurred while calling o23.applySchemaToPythonRDD. : java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

請幫忙，這是否意味着我需要有一個RDD首先建立一個DataFrame？另外，我嘗試了MLlib方法，它工作得很好，這是ML方法不斷給我錯誤。

來源

2016-09-18 jypucca

您使用的是哪個版本的spark？在您的鏈接的例子需要火花2.0.0

在這個環節，你可以找到例如火花1.6.2 - 這是測試是工作在我的機器上 http://spark.apache.org/docs/1.6.2/ml-features.html#onehotencoder

from pyspark.ml.feature import OneHotEncoder, StringIndexer 

df = sqlContext.createDataFrame([ 
    (0, "a"), 
    (1, "b"), 
    (2, "c"), 
    (3, "a"), 
    (4, "a"), 
    (5, "c") 
], ["id", "category"]) 

stringIndexer = StringIndexer(inputCol="category", outputCol="categoryIndex") 
model = stringIndexer.fit(df) 
indexed = model.transform(df) 
encoder = OneHotEncoder(dropLast=False, inputCol="categoryIndex", outputCol="categoryVec") 
encoded = encoder.transform(indexed) 
encoded.select("id", "categoryVec").show()

來源

2016-09-18 06:28:44 Yaron

我用的是星火2.0 0.0 – jypucca

Apache pyspark ML示例不能正常工作

回答

相關問題