如果我啓動pyspark,然後運行這個命令:spark-submit和pyspark有什麼區別?
import my_script; spark = my_script.Sparker(sc); spark.collapse('./data/')
一切都OK。但是,如果我嘗試通過命令行做同樣的事情,引發提交,我得到一個錯誤:
Command: /usr/local/spark/bin/spark-submit my_script.py collapse ./data/
File "/usr/local/spark/python/pyspark/rdd.py", line 352, in func
return f(iterator)
File "/usr/local/spark/python/pyspark/rdd.py", line 1576, in combineLocally
merger.mergeValues(iterator)
File "/usr/local/spark/python/pyspark/shuffle.py", line 245, in mergeValues
for k, v in iterator:
File "/.../my_script.py", line 173, in _json_args_to_arr
js = cls._json(line)
RuntimeError: uninitialized staticmethod object
my_script:
...
if __name__ == "__main__":
args = sys.argv[1:]
if args[0] == 'collapse':
directory = args[1]
from pyspark import SparkContext
sc = SparkContext(appName="Collapse")
spark = Sparker(sc)
spark.collapse(directory)
sc.stop()
這究竟是爲什麼?運行pyspark和運行spark-submit會有什麼區別,會導致這種分歧?我如何在spark-submit中做這項工作?
編輯:我試圖通過做pyspark my_script.py collapse ./data/
運行這個從bash shell,我得到了同樣的錯誤。當一切正常時唯一的一次是當我在一個python shell中並導入腳本時。
你的意思是spark-submit不是pyspark-submit。此外,這也解釋了spark-submit的作用,但這不是問題。問題是關於spark-submit和pyspark之間的區別。 avrsanjay的回答是一個答案。 – 2016-10-19 07:35:45
沒有像pyspark-submit – 2017-07-10 14:54:27