火花上的並行任務

-1

只使用一個核心所以我有這樣的代碼：火花上的並行任務

conf = SparkConf().setAll((
    ("spark.python.profile", "true" if args.profile else "false"), 
    ("spark.task.maxFailures", "20"), 
    ("spark.driver.cores", "4"), 
    ("spark.executor.cores", "4"), 
    ("spark.shuffle.service.enabled", "true"), 
    ("spark.dynamicAllocation.enabled", "true"), 
)) 

# TODO could this be set somewhere in cosr-ops instead? 
executor_environment = {} 
if config["ENV"] == "prod": 
    executor_environment = { 
     "PYTHONPATH": "/cosr/back", 
     "PYSPARK_PYTHON": "/cosr/back/venv/bin/python", 
     "LD_LIBRARY_PATH": "/usr/local/lib" 
    } 

sc = SparkContext(appName="Common Search Index", conf=conf, environment=executor_environment) 

# First, generate a list of all WARC files 
warc_filenames = list_warc_filenames() 

# Then split their indexing in Spark workers 
warc_records = sc.parallelize(warc_filenames, 4).flatMap(iter_records)

雖然lounches所有它使用的所有核心火花的東西。

但是，當它開始執行任務（索引）時，它僅使用1個100％的內核。

如何使一個火花任務使用所有的核心？

來源

2016-06-27 IvRRimUm

這不會做任何事情......它不會執行任何操作 –

它的確如此，itter_records包含了這項工作。 – IvRRimUm

當它被調用時，它開始將warc bodys索引到ES簇。 – IvRRimUm

問題出在python本身沒有使用所有核心。我實現了多線程。

感謝所有幫助的人。

來源

2016-06-28 09:55:58 IvRRimUm

火花上的並行任務

回答

相關問題