我想將我的Kmean羣集模型保存到本地文件系統中。我使用Pyspark mllib進行Kmean聚類。不過,我收到以下錯誤。如何將spark mllib模型存儲到本地文件系統(windows)
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o46.save.
: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:E:/Work/Python1/work/spark/anomalydetectionspark/test/spark-warehouse
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.<init>(Path.java:172)
at org.apache.hadoop.fs.Path.<init>(Path.java:89)
我的代碼:
clusters = KMeans.train(parsedData, 2, maxIterations=10,
runs=10, initializationMode="random")
# Evaluate clustering by computing Within Set Sum of Squared Errors
def error(point):
center = clusters.centers[clusters.predict(point)]
return sqrt(sum([x**2 for x in (point - center)]))
WSSSE = parsedData.map(lambda point: error(point)).reduce(lambda x, y: x + y)
print("Within Set Sum of Squared Error = " + str(WSSSE))
# Save and load model
clusters.save(sc, "file:E:/Work/Python1/work/spark/anomalydetectionspark/test/spark-warehouse")
是否有人可以幫助我找到我爲什麼會收到錯誤?
與你送入'clusters.save'的文件路徑有關的事情。我不熟悉Windows上的文件路徑,但是我懷疑是你需要刪除'file:'前綴,所以它只是'E:/ path/to/file'。 – Magsol
@Magsol,我也試過。但仍然相同的錯誤 – Backtrack
火花kmeans是關於你會發現最慢。不要使用它。 –