在Spark SQL中,我正在使用Python,我正在嘗試在下面的sql的輸出RDD上工作。這是一個推文列表。我需要拆分單詞和提取@但使用的地圖,並試圖用空格我收到下述異常消息屬性錯誤:在火花上分割使用lambda的sql python
words.tw = sqlContext.sql("SELECT text FROM tweet where text like '%@%'")
tweetrdd = tw.rdd.map(lambda line: line.split(" "))
tweetrdd.collect()
ERROR executor.Executor: Exception in task 0.0 in stage 84.0 (TID 310)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main
process()
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "<stdin>", line 1, in <lambda>
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 1272, in __getattr__
raise AttributeError(item)
**AttributeError: split**
例如將是有益的 – mtoto