0
我正在嘗試PySpark的機器學習教程。嘗試打印數據集表時出現問題
當我進入「相關性和數據準備」部分時出現問題。
試圖這裏運行此代碼:
from pyspark.sql.types import DoubleType
from pyspark.sql.functions import UserDefinedFunction
binary_map = {'Yes':1.0, 'No':0.0, 'True':1.0, 'False':0.0}
toNum = UserDefinedFunction(lambda k: binary_map[k], DoubleType())
CV_data = CV_data.drop('State').drop('Area code') \
.drop('Total day charge').drop('Total eve charge') \
.drop('Total night charge').drop('Total intl charge') \
.withColumn('Churn', toNum(CV_data['Churn'])) \
.withColumn('International plan', toNum(CV_data['International plan'])) \
.withColumn('Voice mail plan', toNum(CV_data['Voice mail plan'])).cache()
final_test_data = final_test_data.drop('State').drop('Area code') \
.drop('Total day charge').drop('Total eve charge') \
.drop('Total night charge').drop('Total intl charge') \
.withColumn('Churn', toNum(final_test_data['Churn'])) \
.withColumn('International plan', toNum(final_test_data['International plan'])) \
.withColumn('Voice mail plan', toNum(final_test_data['Voice mail plan'])).cache()
這是打印終端(偏)上的錯誤消息。
17/06/20 17:58:53 WARN BlockManager: Putting block rdd_38_0 failed due to an exception
17/06/20 17:58:53 WARN BlockManager: Block rdd_38_0 could not be removed as it was not found on disk or in memory
17/06/20 17:58:53 WARN BlockManager: Putting block rdd_53_0 failed due to an exception
17/06/20 17:58:53 WARN BlockManager: Block rdd_53_0 could not be removed as it was not found on disk or in memory
17/06/20 17:58:53 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID 16)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/main/spark-2.1.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 174, in main
process()
File "/home/main/spark-2.1.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 169, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/home/main/spark-2.1.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 106, in <lambda>
func = lambda _, it: map(mapper, it)
File "<string>", line 1, in <lambda>
File "/home/main/spark-2.1.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 70, in <lambda>
return lambda *a: f(*a)
File "<stdin>", line 1, in <lambda>
KeyError: False
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
....
錯誤消息的其餘部分可以從this document here觀看。
有誰知道這是什麼問題?
在此先感謝。