我試圖使用隨機森林模型來預測示例流,但看起來我無法使用該模型對示例進行分類。 這裏是pyspark使用的代碼:結合Spark Streaming + MLlib
sc = SparkContext(appName="App")
model = RandomForest.trainClassifier(trainingData, numClasses=2, categoricalFeaturesInfo={}, impurity='gini', numTrees=150)
ssc = StreamingContext(sc, 1)
lines = ssc.socketTextStream(hostname, int(port))
parsedLines = lines.map(parse)
parsedLines.pprint()
predictions = parsedLines.map(lambda event: model.predict(event.features))
並且在集羣中的編譯它返回的錯誤:
Error : "It appears that you are attempting to reference SparkContext from a broadcast "
Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
是有使用從靜態數據產生的MODELE以預測的方式流媒體示例?
謝謝你們,我真的很感激它!
我寫了一個類似的問題在這裏https://stackoverflow.com/questions/48846882/pyspark-ml-streaming –